Recently I have needed to solve a problem that involves generating a PDF file based on database content. Since these PDFs are not generated too often, it doesn't make sense to 24/7 running service. Luckily both Google (Functions) and AWS (Lambda) have an event-driven service which is only running on request.

Originally I was planning to use Python and a Reportlab for this project but a connection to PostgreSQL database ended up being too complex to configure. With NodeJS I had already done a small project with database connection so I knew that it would work.

For NodeJS I still needed a package to generator PDF, and I found following options:

  • PDFKit
  • PDFMake
  • ReLaXed
  • Puppeteer

I ended up choosing Puppeteer for this project. It's a bit overkill for the current use case but at the same time, it is more future proof due to html+css base structure.

To make my life easier I'm using a serverless package to handle deployment to AWS Lambda and chrome-aws-lambda to help out the deployment of puppeteer to AWS Lambda. Full list of required dependencies are the following:

"dependencies": {
  "chrome-aws-lambda": "1.18.1",
  "knex": "0.18.3",
  "pg": "7.11.0",
  "pg-hstore": "2.3.2",
  "pug": "2.0.4",
  "puppeteer-core": "1.18.1",
}
"devDependencies": {
    "serverless": "1.40.0",
    "serverless-apigw-binary": "0.4.4",
    "serverless-offline": "4.9.4",
  }
Enter fullscreen mode Exit fullscreen mode

Aside from the main requirements, I'm using knex, pg, and pg-hstore to handle database connection and pug as a template engine. For local testing I'm using serverless-offline and to help the binary addition to lambda, I'm using serverless-apigw-binary.

Creating a lambda function

The process of creating a pdf goes following:

  1. Fetch the data which we will use to create report (in my case from db with knex)
  2. Create a html template which will be comined with the data (I'm using pug in here).
  3. Load puppeteer and open html file with puppeteer.
  4. Generate a pdf page with puppeteer.
  5. Return PDF as a base64 string.
'use strict'
const chromium = require('chrome-aws-lambda')
const pug = require('pug')
const fs = require('fs')
const path = require('path')

const knex = require('./src/db')

module.exports.pdf = async (event, context) => {
  const yearMonth = ((event || {}).pathParameters || {}).yearMonth || ''
  const year = yearMonth.length == 7 && yearMonth.substring(0, 4)
  const month = yearMonth.length == 7 && yearMonth.substring(5, 6)

  // Select a date
  const selDate = new Date(year, month)
  const filter = {
    month: selDate.toLocaleString('en', { month: 'long' }),
    year: selDate.getFullYear()
  }


  // 1. Load database data wiht Knex TODO
  const result = await knex
    .select()
    .from('sales')
    .where({
      year: filter.year,
      month: selDate.getMonth() + 1
    })

  // 2. Create html
  const template = pug.compileFile('./src/template.pug')
  const html = template({ ...filter, result })

  // 3. Open puppeteer
  let browser = null
  try {
    browser = await chromium.puppeteer.launch({
      args: chromium.args,
      defaultViewport: chromium.defaultViewport,
      executablePath: await chromium.executablePath,
      headless: chromium.headless
    })

    const page = await browser.newPage()
    page.setContent(html)

    // 4. Create pdf file with puppeteer
    const pdf = await page.pdf({
      format: 'A4',
      printBackground: true,
      margin: { top: '1cm', right: '1cm', bottom: '1cm', left: '1cm' }
    })

    // 5. Return PDf as base64 string
    const response = {
      headers: {
        'Content-type': 'application/pdf',
        'content-disposition': 'attachment; filename=test.pdf'
      },
      statusCode: 200,
      body: pdf.toString('base64'),
      isBase64Encoded: true
    }
    context.succeed(response)
  } catch (error) {
    return context.fail(error)
  } finally {
    if (browser !== null) {
      await browser.close()
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Deployment to AWS lambda

As earlier said, we are using Serverless for deployment so that the configuration is not too heavy.

service:
  name: PDF

plugins:
  - serverless-offline
  - serverless-apigw-binary

provider:
  name: aws
  runtime: nodejs8.10
  region: eu-central-1
  stage: ${opt:stage, 'development'}
  environment:
    ENV: ${self:provider.stage}

custom:
  apigwBinary:
    types:
      - '*/*'

functions:
  pdf:
    handler: pdf.pdf
    events:
      - http:
          path: pdf
          method: get
          cors: true
Enter fullscreen mode Exit fullscreen mode

The keys in here are that we enable / for apigwBinary so that PDF goes through in a correct format.

And here we have everything to generate PDF in AWS lambda. To my opinion generating the pdf with 1024 MB took something like 4000ms which would mean that total price would be close to 1 euro per 20000 PDF generations after free tier.

If you want to try it out yourself, I have created a repository to Github.

Logo

权威|前沿|技术|干货|国内首个API全生命周期开发者社区

更多推荐