Recently I have needed to solve a problem that involves generating a PDF file based on database content. Since these PDFs are not generated too often, it doesn't make sense to 24/7 running service. Luckily both Google (Functions) and AWS (Lambda) have an event-driven service which is only running on request.
Originally I was planning to use Python and a Reportlab for this project but a connection to PostgreSQL database ended up being too complex to configure. With NodeJS I had already done a small project with database connection so I knew that it would work.
For NodeJS I still needed a package to generator PDF, and I found following options:
- PDFKit
- PDFMake
- ReLaXed
- Puppeteer
I ended up choosing Puppeteer for this project. It's a bit overkill for the current use case but at the same time, it is more future proof due to html+css base structure.
To make my life easier I'm using a serverless package to handle deployment to AWS Lambda and chrome-aws-lambda to help out the deployment of puppeteer to AWS Lambda. Full list of required dependencies are the following:
"dependencies": {
"chrome-aws-lambda": "1.18.1",
"knex": "0.18.3",
"pg": "7.11.0",
"pg-hstore": "2.3.2",
"pug": "2.0.4",
"puppeteer-core": "1.18.1",
}
"devDependencies": {
"serverless": "1.40.0",
"serverless-apigw-binary": "0.4.4",
"serverless-offline": "4.9.4",
}
Aside from the main requirements, I'm using knex, pg, and pg-hstore to handle database connection and pug as a template engine. For local testing I'm using serverless-offline and to help the binary addition to lambda, I'm using serverless-apigw-binary.
Creating a lambda function
The process of creating a pdf goes following:
- Fetch the data which we will use to create report (in my case from db with knex)
- Create a html template which will be comined with the data (I'm using pug in here).
- Load puppeteer and open html file with puppeteer.
- Generate a pdf page with puppeteer.
- Return PDF as a base64 string.
'use strict'
const chromium = require('chrome-aws-lambda')
const pug = require('pug')
const fs = require('fs')
const path = require('path')
const knex = require('./src/db')
module.exports.pdf = async (event, context) => {
const yearMonth = ((event || {}).pathParameters || {}).yearMonth || ''
const year = yearMonth.length == 7 && yearMonth.substring(0, 4)
const month = yearMonth.length == 7 && yearMonth.substring(5, 6)
// Select a date
const selDate = new Date(year, month)
const filter = {
month: selDate.toLocaleString('en', { month: 'long' }),
year: selDate.getFullYear()
}
// 1. Load database data wiht Knex TODO
const result = await knex
.select()
.from('sales')
.where({
year: filter.year,
month: selDate.getMonth() + 1
})
// 2. Create html
const template = pug.compileFile('./src/template.pug')
const html = template({ ...filter, result })
// 3. Open puppeteer
let browser = null
try {
browser = await chromium.puppeteer.launch({
args: chromium.args,
defaultViewport: chromium.defaultViewport,
executablePath: await chromium.executablePath,
headless: chromium.headless
})
const page = await browser.newPage()
page.setContent(html)
// 4. Create pdf file with puppeteer
const pdf = await page.pdf({
format: 'A4',
printBackground: true,
margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' }
})
// 5. Return PDf as base64 string
const response = {
headers: {
'Content-type': 'application/pdf',
'content-disposition': 'attachment; filename=test.pdf'
},
statusCode: 200,
body: pdf.toString('base64'),
isBase64Encoded: true
}
context.succeed(response)
} catch (error) {
return context.fail(error)
} finally {
if (browser !== null) {
await browser.close()
}
}
}
Deployment to AWS lambda
As earlier said, we are using Serverless for deployment so that the configuration is not too heavy.
service:
name: PDF
plugins:
- serverless-offline
- serverless-apigw-binary
provider:
name: aws
runtime: nodejs8.10
region: eu-central-1
stage: ${opt:stage, 'development'}
environment:
ENV: ${self:provider.stage}
custom:
apigwBinary:
types:
- '*/*'
functions:
pdf:
handler: pdf.pdf
events:
- http:
path: pdf
method: get
cors: true
The keys in here are that we enable / for apigwBinary so that PDF goes through in a correct format.
And here we have everything to generate PDF in AWS lambda. To my opinion generating the pdf with 1024 MB took something like 4000ms which would mean that total price would be close to 1 euro per 20000 PDF generations after free tier.
If you want to try it out yourself, I have created a repository to Github.
所有评论(0)