How to deal with Aurora Serverless coldstarts

weixin_0010034

32人浏览 · 2022-08-10 02:35:23

weixin_0010034 · 2022-08-10 02:35:23 发布

In the project I am currently working we are migrating a Web Application built with PHP and a Vertica Database to a React SAP + Serverless Backend ( AWS API Gateway + Lambda + Aurora Serverless).

The switch from Vertica to Aurora was the last step and from our integration tests, everything looked fine. The only issue we were facing was that those tests were "sometimes" failing.
We were pretty sure from the beginning it has to do with Aurora Serverless DB "cold starts". We already applied some workaround to avoid the Lambda "falling asleep" but now it was the case of the VPC + DB Cluster taking ages to start up.

falling asleep

Checking the logs we found out that the Lambda was timing out after 5 seconds. Easy. Just increase the timeout in serverless.yml:

timeout: 30 # vcp + paused Aurora cluster can take a while

We put a tremendously high value to exclude any timeout from the lambda itself.
Now logs were telling us that the MySQL driver was timing out. We checked the documentation of mysql2 and found out that it defaulted to 10 seconds.
We increased that as well but since a sleeping Aurora DB cluster can take up to 25 seconds to be awakened... we hit the hard limit on Gateway API endpoint.

screaming

Why HARD limit? Because AWS tells us that an API Gateway times out after 29 seconds and that timeout can NOT be increased ( and this makes perfect sense - you definitely don´t want your RestAPIs to hang for so long - and if they really need so much time probably it's the case of changing the architecture and move to something more asynchronous)

So.. what could we do?

We already had a warm-up but that was just used to spawn the container: handler was immediately returning after checking the context (as I described here).
We could have simply modified the logic so that on warm-up we would ping the DB waking it up.
But a better solution is to disable the "pausing" feature on Aurora Serverless and reducing Capacity Units to 1 so that your DB Cluster never goes to sleep and you have always at least one ACU available and if it´s the case AWS scales it up automatically.
Of course, this seems to somehow defeat the purpose of the Serverless DB - where you configure your DB to autoscale when needed and pay only for its usage:

if I have to keep an instance always on because the startup time is way too much, then what's the gain in respect to having Aurora on EC2?

Well, the benefit is exactly the autoscaling functionality ( over multiple Availability Zones).

If you don't need multiple AZ and you are so cost sensitive you can definitely just use a provisioned aurora instead of a serverless.

Here you can find an awesome and detailed article about the costs of Aurora Serverless compared to Aurora on EC2.

In our case it was not a big deal spending a bunch of euros more to have a more stable service and don't go crazy keeping up with cronjobs and warmups therefore we decided to keep 1 ACU always active on production and just bear with the cold starts on staging and dev (any QA tester would just have to refresh the ReactApp page in order to have the connection running - and same could be done for integration tests - either retry or ping the DB - wait and then execute them)

How do you set this configuration in serverless?

In the AWS UIconsole, it's easy. Just click the configuration tab - change ACU and Autopause fields and it's done.

AWS UI Console RDS configuration

With the Serverless Framework, the hardest part is always finding the right properties to use to describe the stack in code and navigating the massive AWS documentation.
You can read about the Scaling Configuration of your DB Cluster Capacity in the API Reference pages or in the AWS SDK documentation but to find out the right configuration in your yml file you have to go to the Cloud Formation documentation.
Once you are there you will realize it is indeed super simple.

Under Resources just put:

 RDSCluster:
      Type: AWS::RDS::DBCluster
      Properties:
        MasterUsername: YOUR_DB_USERNAME
        MasterUserPassword: YOUR_DB_PSW
        DatabaseName: YOUR_DB_NAME
        Engine: aurora
        EngineMode: serverless
        ScalingConfiguration:
          AutoPause: false
          MaxCapacity: 64
          MinCapacity: 1
        DBSubnetGroupName: YOUR_SUBNET_NAME
        BackupRetentionPeriod: 1
        DeletionProtection: true

This will create an Aurora Serverless DB cluster that can't be deleted, never goes to sleep, and has a minimum of 1 ACU.

In our case though, we wanted to have a different configuration for each different environment. We don't want to waste money on an always available instance for QA and DEV nor we needed Snapshots and Backups for those environments.
So - since conditionals do not really exist in yml - we created a bunch of custom properties and refer to them based on stage:

Under Custom just declare them:

autopause:
    production: false
    default: true

and in the DBCluster config just refer to them like this:

AutoPause: ${self:custom.autopause.${self:provider.stage}, self:custom.autopause.default}

This is how within the serverless you can grab a property based on the stage, and if the stage-name does not exist in the properties fallback to the default.

A nice tip when you play around with the configuration is using

sls print -s YOUR_STAGENAME

to see how all the final yml will look like with all variables resolved.

Did you have any issues with AuroraServerless or do you have any interesting suggesion on the topic?

云原生

云原生社区为您提供最前沿的新闻资讯和知识内容

更多推荐

对全局端点的需求

在这篇文章中,我将探讨全局端点的概念,并就构建多区域应用程序所需的内容分享我的观点。作为系列多区域道路的一部分,您可以查看其他部分: 第 1 部分- 反思在开始多区域架构之前要考虑的事项。第 2 部分- CloudFront 故障转移配置。第 3 部分- Amazon API Gateway HTTP API 故障转移和延迟配置。第 4 部分- Amazon DynamoDB 全局表。部

云原生

使用 Visual Studio IDE 使用 .NET Core 6 运行时创建简单的 AWS Lambda 函数

在这篇博客中,让我们看看如何开始使用 .NET Core 6 创建无服务器函数 (AWS Lambda) 我们要做什么? 使用 Visual Studio 创建 Lambda 函数将函数推送到 AWS Lambda 确定测试功能的不同方法先决条件: 1.Visual Studio 2022 社区版 2.AWS Toolkit for Visual Studio- 用于使用 Visual Stu

云原生

在 Mac 上即时提升 Docker 性能

macOS 适合运行 Docker 吗? Mac 是优秀的开发平台。它们支持 Java、Python、Ruby 和许多其他常用的开发语言。它们让开发人员打开一个 shell 并找到一个看起来很像 Linux 的环境。许多开发者都欣赏 Mac 和 Apple 以出名的用户体验。更重要的是,Mac 支持最流行的开发工具,如 VS Code、JetBrains 工具箱和......Docker——这是测