As container adoption and usage continues to rise, Kubernetes (K8s) has become the leading platform for container orchestration. It’s an open-source project with tens of thousands of contributors from over 315 companies with the intention of remaining extensible and cloud-agnostic, and it’s the foundation of every major cloud provider.

随着容器采用率和使用率的不断提高, Kubernetes (K8s)已成为容器编排的领先平台。 这是一个开源项目,拥有来自315多家公司的数万名贡献者,旨在保持可扩展性和不可知论性 ,并且它是每个主要云提供商的基础。

When you have containers running in production, you want your production environment to be as stable and resilient as possible to avert disaster (think every online Black Friday shopping experience). When a container goes down, another one needs to spin up to take its place, no matter what time of day — or into the wee hours of the night — it is. Kubernetes provides a framework for running distributed systems resiliently, from scaling to failover to load balancing and more. And there are many tools that integrate with Kubernetes to help meet your needs.

当您的容器正在生产中运行时,您希望生产环境尽可能稳定和有弹性,以避免灾难发生(请考虑每个黑色星期五的在线购物体验)。 当一个集装箱掉落时,无论是在一天中的什么时候或晚上的几个小时,另一个集装箱都需要旋转起来才能代替它。 Kubernetes提供了一个框架,可用于从扩展到故障转移再到负载平衡等弹性运行分布式系统。 并且有许多工具与Kubernetes集成在一起,可以满足您的需求。

Best practices evolve with time, so it’s always good to continuously research and experiment for better ways for Kubernetes development. As it is still a young technology, we are always looking to improve our understanding and use of it.

最佳实践会随着时间而发展,因此不断进行研究和实验以寻求更好的Kubernetes开发方法总是好的。 由于它仍然是一项年轻技术,因此我们一直在寻求增进对它的理解和使用。

In this article, we’ll be examining ten common practices in Kubernetes deployments that have better solutions at a high level. I will not go into depth on the best practices since custom implementation might vary among users.

在本文中,我们将研究Kubernetes部署中的十种常见实践,这些实践在更高层次上具有更好的解决方案。 由于用户的自定义实现可能会有所不同,因此我不会深入探讨最佳实践。

  1. Putting the configuration file inside/alongside the Docker image

    将配置文件放在Docker映像的内部/旁边
  2. Not using Helm or other kinds of templating

    不使用头盔或其他种类的模板
  3. Deploying things in a specific order. (Applications shouldn’t crash because a dependency isn’t ready.)

    以特定顺序部署事物。 (应用程序不应崩溃,因为依赖项尚未准备好。)
  4. Deploying pods without set memory and/or CPU limits

    在没有设置内存和/或CPU限制的情况下部署Pod
  5. Pulling the latest tag in containers in production

    在生产中将latest标签拉到容器中

  6. Deploying new updates/fixes by killing pods so they pull the new Docker images during the restart process

    通过杀死Pod来部署新的更新/修复,以便它们在重启过程中提取新的Docker映像
  7. Mixing both production and non-production workloads in the same cluster.

    在同一群集中混合生产和非生产工作负载。
  8. Not using blue/green or canaries for mission-critical deployments. (The default rolling update of Kubernetes is not always enough.)

    对于关键任务部署,不使用蓝色/绿色或金丝雀。 (Kubernetes的默认滚动更新并不总是足够的。)
  9. Not having metrics in place to understand if a deployment was successful or not. (Your health checks need application support.)

    没有适当的指标来了解部署是否成功。 (您的健康检查需要应用程序支持。)
  10. Cloud vendor lock-in: locking yourself into an IaaS provider’s Kubernetes or serverless computing services

    云供应商锁定:将自己锁定在IaaS提供商的Kubernetes或无服务器计算服务中

十个Kubernetes反模式 (Ten Kubernetes Anti-Patterns)

1.将配置文件放在Docker映像的内部/旁边 (1. Putting the configuration file inside/alongside the Docker image)

This Kubernetes anti-pattern is related to a Docker anti-pattern (see anti-patterns 5 and 8 in this article). Containers give developers a way to use a single image, essentially in the production environment, through the entire software lifecycle, from dev/QA to staging to production.

此Kubernetes反模式涉及一种多克尔反图案(见反模式5和8在本文中 )。 容器使开发人员可以在整个软件生命周期(从dev / QA到过渡到生产)的整个生命周期中,在生产环境中使用单个映像。

However, a common practice is to give each phase in the lifecycle its own image, each built with different artifacts specific to its environment (QA, staging, or production). But now you’re no longer deploying what you’ve tested.

但是,通常的做法是为生命周期的每个阶段提供自己的映像,每个阶段都使用针对其环境(QA,分段或生产)的不同工件来构建。 但是现在您不再部署已测试的产品。

Image for post
Don’t hardcode your configuration at build time (from 不要在构建时对配置进行硬编码(来自 https://codefresh.io/containers/docker-anti-patterns/https://codefresh.io/containers/docker-anti-patterns/ ) )

The best practice here is to externalize general-purpose configuration in ConfigMaps, while sensitive information (like API keys and secrets) can be stored in the Secrets resource (which has Base64 encoding but otherwise works the same as ConfigMaps). ConfigMaps can be mounted as volumes or passed in as environment variables, but Secrets should be mounted as volumes. I mention ConfigMaps and Secrets because they are native Kubernetes resources and don’t require integrations, but they can be limiting. There are other solutions available like ZooKeeper and Consul by HashiCorp for configmaps, or Vault by HashiCorp, Keywhiz, Confidant, etc, for secrets, that might better fit your needs.

这里的最佳实践是外部化ConfigMap中的通用配置,而敏感信息(如API密钥和机密)可以存储在Secrets资源中(该资源具有Base64编码,但与ConfigMaps相同)。 ConfigMap可以作为卷安装,也可以作为环境变量传递,但是Secrets应该作为卷安装。 我之所以提到ConfigMaps和Secrets,是因为它们是Kubernetes的本机资源,不需要集成,但是它们可能会受到限制。 还有可像其他解决方案的ZooKeeper领事通过HashiCorp为configmaps,或库由HashiCorpKeywhiz知己等,为秘密,这可能更适合您的需求。

When you’ve decoupled your configuration from your application, you no longer need to recompile the application when you need to update the configuration — and it can be updated while the app is running. Your applications fetch the configuration during runtime instead of during the build. More importantly, you’re using the same source code in all the phases of the software lifecycle.

将配置与应用程序解耦后,在需要更新配置时不再需要重新编译应用程序,并且可以在应用程序运行时对其进行更新。 您的应用程序在运行时而不是在构建过程中获取配置。 更重要的是,您在软件生命周期的所有阶段都使用相同的源代码。

Image for post
Load configuration during runtime (from 在运行时加载配置(来自 https://codefresh.io/containers/docker-anti-patterns/https://codefresh.io/containers/docker-anti-patterns/ ) )

2.不使用Helm或其他种类的模板 (2. Not using Helm or other kinds of templating)

You can manage Kubernetes deployments by directly updating YAML. When rolling out a new version of code, you will probably have to update one or more of the following:

您可以通过直接更新YAML来管理Kubernetes部署。 推出新版本的代码时,您可能必须更新以下一项或多项内容:

  • Docker image name

    Docker映像名称
  • Docker image tag

    Docker映像标签
  • Number of replicas

    副本数
  • Service labels

    服务标签
  • Pods

    豆荚
  • Configmaps, etc

    Configmap等

This can get tedious if you’re managing multiple clusters and applying the same updates across your development, staging, and production environments. You are basically modifying the same files with minor modifications across all your deployments. It’s a lot of copy-and-paste, or search-and-replace, while also staying aware of the environment for which your deployment YAML is intended. There are a lot of opportunities for mistakes during this process:

如果要管理多个集群并在开发,登台和生产环境中应用相同的更新,这可能会变得很乏味。 您基本上是在所有部署中进行少量修改的情况下修改相同文件。 它进行了大量的复制和粘贴,或者搜索和替换,同时还了解部署YAML的目标环境。 在此过程中,有很多出错的机会:

  • Typos (wrong version numbers, misspelling image names, etc.)

    拼写错误(版本号错误,图片名称拼写错误等)
  • Modifying YAML with the wrong update (for example, connecting to the wrong database)

    使用错误的更新修改YAML(例如,连接到错误的数据库)
  • Missing a resource to update, etc.

    缺少要更新的资源,等等。
kind: Deployment
metadata:
  name: nginx-deployment
labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

There might be a number of things you might need to change in the YAML, and if you’re not paying close attention, one YAML could be easily mistaken for another deployment’s YAML.

您可能需要在YAML中进行许多更改,并且如果您不密切注意,则很容易将一个YAML误认为另一个部署的YAML。

Templating helps streamline the installation and management of Kubernetes applications. Since Kubernetes doesn’t provide a native templating mechanism, we have to look elsewhere for this type of management.

模板有助于简化Kubernetes应用程序的安装和管理。 由于Kubernetes没有提供本地的模板机制,因此我们必须在其他地方寻找这种管理类型。

Helm was the first package manager available (2015). It was proclaimed to be “Homebrew for Kubernetes” and evolved to include templating capabilities. Helm packages its resources via charts, where a chart is a collection of files describing a related set of Kubernetes resources. There are 1,400+ publicly available charts in the chart repository, (you can also use helm search hub [keyword] [flags]), basically reusable recipes for installing, upgrading, and uninstalling things on Kubernetes. With Helm charts, you can modify the values.yaml file to set the modifications you need for your Kubernetes deployments, and you can have a different Helm chart for each environment. So if you have a QA, staging, and production environment, you only have to manage three Helm charts instead of modifying each YAML in each deployment in each environment.

Helm是第一个可用的包管理器(2015年)。 它被称为“ Kubernetes的自制软件”,并发展为包括模板功能。 Helm通过图表打包其资源,其中图表是描述一组相关的Kubernetes资源的文件的集合。 图表存储库中有1,400多个公开可用的图表((您也可以使用helm search hub [keyword] [flags] )),基本上是可重用的配方,用于在Kubernetes上进行安装,升级和卸载。 使用Helm图表,您可以修改values.yaml文件以设置Kubernetes部署所需的修改,并且每种环境都可以使用不同的Helm图表。 因此,如果您具有质量检查,过渡和生产环境,则只需管理三个Helm图表,而无需在每个环境中的每个部署中修改每个YAML。

Another advantage we get with Helm is that it’s easy to roll back to a previous revision with Helm rollbacks if something goes wrong with:

我们使用Helm的另一个优势是,如果出现问题,可以通过Helm回滚很容易地回滚到以前的版本:

helm rollback <RELEASE> [REVISION] [flags] .

helm rollback <RELEASE> [REVISION] [flags]

If you want to roll back to the immediate prior version, you can use:

如果要回滚到之前的先前版本,可以使用:

helm rollback <RELEASE> 0 .

helm rollback <RELEASE> 0

So we’d see something like:

所以我们会看到类似的东西:

$ helm upgrade — install — wait — timeout 20 demo demo/
$ helm upgrade — install — wait — timeout 20 — set
readinessPath=/fail demo demo/
$ helm rollback — wait — timeout 20 demo 1Rollback was a success.

And the Helm chart history tracks it nicely:

舵图历史很好地跟踪了它:

$ helm history demo
REVISION STATUS DESCRIPTION
1 SUPERSEDED Install complete
2 SUPERSEDED Upgrade “demo” failed: timed out waiting for the condition
3 DEPLOYED Rollback to 1

Google’s Kustomize is a popular alternative and can be used in addition to Helm.

Google的Kustomize是一种流行的替代方法,除了Helm之外 ,还可以使用

3.按特定顺序部署事物 (3. Deploying things in a specific order)

Applications shouldn’t crash because a dependency isn’t ready. In traditional development, there is a specific order to the startup and stop tasks when bringing up applications. It’s important not to bring this mindset into container orchestration. With Kubernetes, Docker, etc., these components start concurrently, making it impossible to define a startup order. Even when the application is up and running, its dependencies could fail or be migrated, leading to further issues. The Kubernetes reality is also riddled with myriad points of potential communication failures where dependencies can’t be reached, during which a pod might crash or a service might become unavailable. Network latency, like a weak signal or interrupted network connection, is a common culprit for communication failure.

应用程序不应崩溃,因为依赖项尚未就绪。 在传统开发中,启动应用程序时,启动和停止任务有特定的顺序。 重要的是不要将这种思想带入容器编排。 使用Kubernetes,Docker等,这些组件可以同时启动,因此无法定义启动顺序。 即使在应用程序启动并运行时,其依赖关系也可能会失败或迁移,从而导致进一步的问题。 Kubernetes现实也充斥着无数潜在的通信故障,这些故障无法达到依赖关系,在此期间,pod可能崩溃或服务可能变得不可用。 网络延迟,例如信号微弱或网络连接中断,是导致通信失败的常见原因。

For simplicity’s sake, let’s examine a hypothetical shopping application that has two services: an inventory database and a storefront UI. Before the application can launch, the back-end service has to start, meet all its checks, and start running. Then the front-end service can start, meet its checks, and start running.

为简单起见,让我们研究一个假设的购物应用程序,它具有两项服务:库存数据库和店面UI。 在启动应用程序之前,后端服务必须启动,满足其所有检查并开始运行。 然后,前端服务可以启动,满足其检查条件并开始运行。

Let’s say we’ve forced the deployment order with the kubectl wait command, something like:

假设我们已使用kubectl wait命令强制执行了部署顺序,如下所示:

kubectl wait — for=condition=Ready pod/serviceA

kubectl wait — for=condition=Ready pod/serviceA

But when the condition is never met, the next deployment can’t proceed and the process breaks.

但是,当条件从未满足时,下一个部署将无法进行,并且过程会中断。

This is a simplistic flow of what a deployment order might look like:

这是部署顺序的简化流程:

Image for post
This process cannot move forward until the previous step is complete 在上一步完成之前,此过程无法继续进行

Since Kubernetes is self-healing. The standard approach is to let all the services in an application start concurrently and let the containers crash and restart until they are all up and running. I have service A and B starting independently (as a decoupled, stateless cloud-native application should), but for the sake of the user experience, perhaps I could tell the UI (service B) to display a pretty loading message until service A is ready, but the actual starting up of service B shouldn’t be affected by service A.

由于Kubernetes具有自我修复功能 。 标准方法是让应用程序中的所有服务同时启动,并使容器崩溃并重新启动,直到它们全部启动并运行。 我让服务A和B独立启动(作为一个已解耦的无状态云原生应用程序),但是出于用户体验的考虑,也许我可以告诉UI(服务B)显示漂亮的加载消息,直到服务A被准备就绪,但服务B的实际启动不应受到服务A的影响。

Image for post
Now when the pod crashes, Kubernetes restarts the service until everything is up and running. If you are stuck in CrashLoopBackOff, it’s worth checking your code, configuration, or for resource contention.
现在,当Pod崩溃时,Kubernetes将重新启动服务,直到一切正常并运行。 如果您陷在CrashLoopBackOff中,则值得检查代码,配置或资源争用。

Of course, we need to do more than simply rely on self-healing. We need to implement solutions that will handle failures, which are inevitable and will happen. We should anticipate they will happen and lay down frameworks to respond in a way that helps us avoid downtime and/or data loss.

当然,我们需要做的不仅仅是单纯地自我修复。 我们需要实施解决故障的解决方案,这些故障是不可避免的并会发生。 我们应该预料到它们会发生,并制定框架以能够帮助我们避免停机和/或数据丢失的方式做出响应。

In my hypothetical shopping app, my storefront UI (service B) needs the inventory (service A) in order to give the user a complete experience. So when there’s a partial failure, like if service A wasn’t available for a short time or crashed, etc., the system should still be able to recover from the issue.

在我的假设购物应用程序中,我的店面UI(服务B)需要库存(服务A)才能为用户提供完整的体验。 因此,当出现部分故障时,例如服务A短时间内不可用或崩溃等,系统仍应能够从问题中恢复。

Transient faults like these are an ever-present possibility, so to minimize their effects we can implement a Retry pattern. Retry patterns help improve application stability with strategies like:

诸如此类的瞬时故障是一种经常出现的可能性,因此,为了将其影响降至最低,我们可以实施重试模式 。 重试模式可通过以下策略帮助提高应用程序的稳定性:

  • CancelIf the fault isn’t transient or if the process is unlikely to be successful on repeated attempts, then the application should cancel the operation and report an exception — e.g., authentication failure. Invalid credentials should never work!

    取消如果故障不是暂时的,或者如果反复尝试该过程不太可能成功,则应用程序应取消该操作并报告异常,例如,身份验证失败。 无效的凭证永远不会起作用!

  • RetryIf the fault is unusual or rare, it could be due to uncommon situations (e.g., network packet corruption). The application should retry the request immediately because the same failure is unlikely to reoccur.

    重试如果故障不常见或罕见,则可能是由于异常情况(例如,网络数据包损坏)引起的。 应用程序应立即重试该请求,因为不太可能再次发生相同的故障。

  • Retry after delayIf the fault is caused by common occurrences like connectivity or busy failures, it’s best to let any work backlog or traffic clear up before trying again. The application should wait before retrying the request.

    延迟后重试如果故障是由于常见情况(如连接性或繁忙故障)引起的,那么最好在重新尝试之前让所有工作积压或流量清除。 该应用程序应等待,然后重试该请求。

  • You could also implement your retry pattern with an exponential backoff (exponentially increasing the wait time and setting a maximum retry count).

    您还可以使用指数退避来实现重试模式(以指数方式增加等待时间并设置最大重试次数)。

Implementing a circuit-breaking pattern is also an important strategy when creating resilient microservice applications. Like how a circuit breaker in your house will automatically switch to protect you from extensive damage due to excess current or short-circuiting, the circuit-breaking pattern provides you a method of writing applications while limiting the impact of unexpected faults that might take longer to fix, like partial loss of connectivity, or complete failure of a service. In these situations where retrying won’t work, the application should be able to accept that the failure has occurred and respond accordingly.

在创建弹性微服务应用程序时,实现断路模式也是重要的策略。 就像您家中的断路器将如何自动切换以保护您免受过电流或短路造成的广泛损害一样,断路器模式为您提供了一种编写应用程序的方法,同时可以限制可能需要更长时间才能解决的意外故障的影响。解决,例如部分失去连接性或服务完全失败。 在无法重试的情况下,应用程序应能够接受失败的发生并做出相应的响应。

4.部署没有设置内存和/或CPU限制的pod (4. Deploying pods without set memory and/or CPU limits)

Resource allocation varies depending on the service, and it can be difficult to predict what resources a container might require for optimal performance without testing implementation. One service could require a fixed CPU and memory consumption profile, while another service’s consumption profile could be dynamic.

资源分配取决于服务,并且在不测试实现的情况下很难预测容器可能需要哪些资源才能获得最佳性能。 一种服务可能需要固定的CPU和内存使用情况配置文件,而另一种服务的使用情况配置文件可能是动态的。

When you deploy pods without careful consideration of memory and CPU limits, this can lead to scenarios of resource contention and unstable environments. If a container does not have a memory or CPU limit, then the scheduler sees its memory utilization (and CPU utilization) as zero, so an unlimited number of pods can be scheduled on any node. This can result in the overcommitment of resources and possible node and kubelet crashes.

如果在未仔细考虑内存和CPU限制的情况下部署Pod,可能会导致资源争用和不稳定的环境。 如果容器没有内存或CPU限制,那么调度程序会将其内存利用率(和CPU利用率)视为零,因此可以在任何节点上调度无限数量的Pod。 这可能导致资源过量使用,并可能导致节点崩溃和kubelet崩溃。

When the memory limit is not specified for a container, there are a couple of scenarios that could apply (these also apply to CPU):

如果未为容器指定内存限制,则可能有两种情况(这些情况也适用于CPU):

  1. There is no upper bound on the amount of memory a container can use. Thus, the container could use all of the available memory on its node, possibly invoking the OOM (out of memory) Killer. An OOM Kill situation has a greater chance of occurring for a container with no resource limits.

    容器可以使用的内存量没有上限。 因此,容器可以使用其节点上的所有可用内存,可能会调用OOM(内存不足)Killer。 没有资源限制的容器发生OOM销毁情况的可能性更大。
  2. The default memory limit of the namespace (in which the container is running) is assigned to the container. The cluster administrators can use a LimitRange to specify a default value for the memory limit.

    命名空间(容器在其中运行)的默认内存限制已分配给容器。 群集管理员可以使用LimitRange指定内存限制的默认值。

Declaring memory and CPU limits for the containers in your cluster allows you to make efficient use of the resources available on your cluster’s nodes. This helps the kube-scheduler determine on which node the pod should reside for most efficient hardware utilization.

声明集群中容器的内存和CPU限制,可以有效利用集群节点上的可用资源。 这有助于kube-scheduler确定Pod应该位于哪个节点上,以实现最有效的硬件利用率。

When setting the memory and CPU limits for a container, you should take care not to request more resources than the limit. For pods that have more than one container, the aggregate resource requests must not exceed the set limit(s) — otherwise, the pod will never be scheduled.

在设置容器的内存和CPU限制时,应注意不要请求超出限制的资源。 对于具有多个容器的容器,聚合资源请求不得超过设置的限制-否则,将永远不会调度容器。

Image for post
The resource request must not exceed the limit 资源请求不得超过限制

Setting memory and CPU requests below their limits accomplishes two things:

将内存和CPU请求设置为低于其限制可以完成两件事:

  1. The pod can make use of memory/CPU when it is available, leading to bursts of activity.

    吊舱在可用时可以使用内存/ CPU,从而导致活动大量爆发。
  2. During a burst, the pod is limited to a reasonable amount of memory/CPU.

    在突发期间,吊舱仅限于合理数量的内存/ CPU。

The best practice is to keep the CPU request at one core or below, and then use ReplicaSets to scale it out, which gives the system flexibility and reliability.

最佳实践是将CPU请求保持在一个内核或以下,然后使用ReplicaSets进行扩展,从而提高了系统的灵活性和可靠性。

What happens when you have different teams competing for resources when deploying containers in the same cluster? If the process exceeds the memory limit, then it will be terminated, while if it exceeds the CPU limit, the process will be throttled (resulting in worse performance).

当您有不同的团队在同一集群中部署容器时争夺资源时会发生什么? 如果该进程超出内存限制,则它将终止,而如果超过CPU限制,则将限制该进程(导致更差的性能)。

You can control resource limits via resource quotas and LimitRange in the namespace settings. These settings help account for containers deployments without limits or with high resource requests.

您可以通过名称空间设置中的资源配额LimitRange控制资源限制。 这些设置有助于解决无限制或资源需求高的容器部署问题。

Setting hard resource limits might not be the best choice for your needs. Another option is to use the recommendation mode in the Vertical Pod autoscaler resource.

设置硬资源限制可能不是满足您需求的最佳选择。 另一个选择是在Vertical Pod自动缩放器资源中使用推荐模式。

5. 在生产 中将“ latest' 标签 入容器 (5. Pulling the ‘latest' tag in containers in production)

Using the latest tag is considered bad practice, especially in production. Pods unexpectedly crash for all sorts of reasons, so they can pull down images at any time. Unfortunately, the latest tag is not very descriptive when it comes to determining when the build broke. What version of the image was running? When was the last time it was working? This is especially bad in production since you need to be able to get things back up and running with minimal downtime.

使用 latest 标签被认为是不良做法,尤其是在生产中。 Pod出于各种原因而意外崩溃,因此它们可以随时下拉图像。 不幸的是,在确定构建何时中断时, latest标签的描述性不是很高。 正在运行哪个版本的映像? 上次工作是什么时候? 这在生产中尤其糟糕,因为您需要能够以最少的停机时间使事情恢复并运行。

Image for post
You shouldn’t use the latest tag in production.
您不应该 在生产中 使用 latest 标签。

By default, the imagePullPolicy is set to Always and will always pull down the image when it restarts. If you don’t specify a tag, Kubernetes will default to latest. However, a deployment will only be updated in the event of a crash (when the pod pulls down the image on restart) or if the deployment pod’s template (.spec.template) is changed. See this forum discussion for an example of latest not working as intended in development.

默认情况下, imagePullPolicy设置为“ Always并且在重新启动图像时始终将其下拉。 如果不指定标签,Kubernetes将默认为latest 。 但是,只有在崩溃(当pod在重启时拉下映像)或更改部署pod的模板( .spec.template )时,才会更新部署。 请参阅此论坛讨论以获取latest的示例,该示例在开发中无法正常工作。

Even if you’ve changed the imagePullPolicy to another value than Always, your pod will still pull an image if it needs to restart (whether it’s because of a crash or deliberate reboot). If you use versioning and set the imagePullPolicy with a meaningful tag, like v1.4.0, then you can roll back to the most recent stable version and more easily troubleshoot when and where something went wrong in your code. You can read more about best practices for versioning in the Semantic Versioning Specification and GCP Best Practices.

即使你已经改变了imagePullPolicy到比其它值Always ,您的吊舱仍然会拉的图像是否需要重新启动(不管是因为碰撞或故意重启)。 如果您使用的版本,并设置imagePullPolicy一个有意义的标签,像V1.4.0,那么你就可以回滚到最新稳定版本,更容易解决在哪里出事了你的代码。 您可以在语义版本规范GCP最佳实践中阅读有关版本最佳实践的更多信息

In addition to using specific and meaningful Docker tags, you should also remember that containers are stateless and immutable. They are also meant to be ephemeral (and you should store any data outside containers in persistent storage). Once you spin up a container, you should not modify it: no patches, no updates, no configuration changes. When you need to update a configuration, you should deploy a new container with the updated config.

除了使用特定且有意义的Docker标签外,您还应该记住容器是无状态且不可变的。 它们也应该是短暂的(您应该将任何数据存储在容器外部的持久存储中)。 一旦启动了容器,就不应对其进行修改:没有补丁,没有更新,没有配置更改。 当需要更新配置时,应使用更新的配置部署新容器。

Image for post
Docker immutability, taken from Best Practices for Operating ContainersDocker不变性,取自《操作容器的最佳实践》 .

This immutability allows for safer and repeatable deployments. You can also more easily roll back if you need to redeploy the old image. By keeping your Docker images and container immutable, you are able to deploy the same container image in every single environment. See Anti-pattern 1 to read about externalizing your configuration data to keep your images immutable.

这种不变性可以实现更安全,可重复的部署。 如果您需要重新部署旧图像,也可以更轻松地回滚。 通过使Docker映像和容器保持不变,您可以在每个单独的环境中部署相同的容器映像。 请参阅反模式1,以了解有关外部化配置数据以保持映像不变的信息。

Image for post
We can roll back to the previous stable version while we troubleshoot. 在进行故障排除时,我们可以回滚到以前的稳定版本。

6.通过杀死吊舱来部署新的更新/修复程序,以便它们在重启过程中提取新的Docker映像 (6. Deploying new updates/fixes by killing pods so they pull the new Docker images during the restart process)

Like relying on the latest tag to pull updates, relying on killing pods to roll out new updates is bad practice since you’re not versioning your code. If you are killing pods to pull updated Docker images in production, don’t do it. Once a version has been released in production, it should never be overwritten. If something breaks, then you won’t know where or when things went wrong and how far back to go when you need to roll back the code while you troubleshoot.

就像依靠最新标记来获取更新一样,依靠杀死Pod来推出新更新也是一种坏习惯,因为您没有对代码进行版本控制。 如果您要杀死吊舱以在生产环境中提取更新的Docker映像,请不要这样做。 一旦将某个版本发布到生产环境中,就永远不能覆盖它。 如果发生故障,那么您将不知道什么地方或什么地方出错,以及在进行故障排除时需要回滚代码的时间。

Another problem is that restarting the container to pull a new Docker image doesn’t always work. “A Deployment’s rollout is triggered if and only if the Deployment’s Pod template (that is, .spec.template) is changed, for example if the labels or container images of the template are updated. Other updates, such as scaling the Deployment, do not trigger a rollout.”

另一个问题是,重新启动容器以提取新的Docker映像并不总是有效。 “只有且仅当更改了部署的Pod模板(即.spec.template )(例如,模板的标签或容器映像已更新)时,才会触发部署的推出。 其他更新,例如扩展部署,不会触发部署。”

kind: Deployment
metadata:
  name: nginx-deployment
labels:
    app: nginx
spec:
  replicas: 3
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80

You have to modify the .spec.template to trigger a deployment.

您必须修改.spec.template才能触发部署。

The correct way to update your pods to pull new Docker images is to version (or increment for fixes/patches) your code and then modify the deployment spec to reflect a meaningful tag (not latest, see Anti-pattern 5) for further discussion on that, but something like v1.4.0 for a new release or v1.4.1 for a patch). Kubernetes will then trigger an upgrade with zero downtime.

更新您的Pod以提取新Docker映像的正确方法是对代码进行版本化 (或为修订/补丁增值),然后修改部署规范以反映出有意义的标签(不是latest ,请参阅反模式5),以进一步讨论,但类似于v1.4.0(针对新版本)或v1.4.1(针对补丁)。 然后Kubernetes将以零停机时间触发升级。

  1. Kubernetes starts a new pod with the new image.

    Kubernetes使用新映像启动一个新容器。
  2. Waits for health checks to pass.

    等待健康检查通过。
  3. Deletes the old pod.

    删除旧的窗格。

7.在同一群集中混合生产和非生产工作负载 (7. Mixing both production and non-production workloads in the same cluster)

Kubernetes supports a namespace feature, which enables users to manage different environments (virtual clusters) within the same physical cluster. Namespaces can be seen as a cost-effective way of managing different environments on a single physical cluster. For example, you could run staging and production environments in the same cluster and save resources and money. However, there’s a big gap between running Kubernetes in development and running Kubernetes in production.

Kubernetes支持名称空间功能,该功能使用户可以管理同一物理群集中的不同环境(虚拟群集)。 命名空间可视为在单个物理群集上管理不同环境的一种经济高效的方式。 例如,您可以在同一群集中运行暂存和生产环境,从而节省资源和资金。 但是,在开发中运行Kubernetes与在生产中运行Kubernetes之间存在很大差距。

There are a lot of factors to consider when you mix your production and non-production workloads on the same cluster. For one, you would have to consider resource limits to make sure the performance of your production environment isn’t compromised (a common practice one might see is setting no quota on the production namespace and a quota on any non-production namespace(s)).

在同一集群上混合生产和非生产工作负载时,需要考虑很多因素。 例如,您必须考虑资源限制以确保生产环境的性能不受影响(一种常见的做法是,在生产名称空间上不设置配额,在任何非生产名称空间上都设置配额。 )。

You would also need to consider isolation. Developers require a lot more access and permissions than in production, which you would want locked down as much as possible. While namespaces are hidden from each other, they are not fully isolated by default. That means your apps in a dev namespace could call apps in test, staging, or production (or vice versa), which is not considered good practice. Of course, you could use NetworkPolicies to set rules to isolate the namespaces.

您还需要考虑隔离。 与生产环境相比,开发人员需要更多的访问权限和许可,您可能希望将它们尽可能地锁定。 尽管名称空间彼此隐藏,但默认情况下它们并未完全隔离。 这意味着开发人员命名空间中的应用程序可以在测试,登台或生产中调用应用程序(反之亦然),这不是一个好习惯。 当然,您可以使用NetworkPolicies设置规则以隔离名称空间。

However, thoroughly testing resource limits, performance, security, and reliability is time-consuming, so running production workloads in the same cluster as non-production workloads is not advised. Rather than mixing production and non-production workloads in the same cluster, use separate clusters for development/test/production — you’ll have better isolation and security that way. You should also automate as much as you can for CI/CD and promotion to reduce the chance for human error. Your production environment needs to be as solid as possible.

但是,彻底测试资源限制,性能,安全性和可靠性非常耗时,因此不建议在与非生产工作负载相同的群集中运行生产工作负载。 与其在同一个集群中混合生产和非生产工作负载,不如将单独的集群用于开发/测试/生产-这样您将拥有更好的隔离和安全性。 对于CI / CD和升级,还应该尽可能地自动化,以减少人为错误的可能性。 您的生产环境需要尽可能牢固。

8.不要将蓝色/绿色或金丝雀用于关键任务部署 (8. Not using blue/green or canaries for mission-critical deployments)

Many modern applications have frequent deployments, ranging from several changes within a month to multiple deployments in a single day. This is certainly achievable with microservice architecture since the different components can be developed, managed, and released on different cycles as long as they work together to perform seamlessly. And of course, keeping applications up 24/7 is obviously important when rolling out updates.

许多现代应用程序都有频繁的部署,范围从一个月内的几次更改到一天之内的多次部署。 只要微服务架构能够协同工作以无缝执行,就可以以不同的周期开发,管理和发布不同的组件,因此使用微服务架构肯定可以实现这一点。 当然,在推出更新时,保持应用程序24/7全天候的运行显然很重要。

The default rolling update of Kubernetes is not always enough. A common strategy to perform updates is to use the default Kubernetes rolling update feature:

Kubernetes的默认滚动更新并不总是足够的。 执行更新的常见策略是使用默认的Kubernetes 滚动更新功能:

.spec.strategy.type==RollingUpdate

.spec.strategy.type==RollingUpdate

where you can set the maxUnavailable (percentage or number of pods unavailable) and maxSurge fields (optional) to control the rolling update process. When implemented properly, rolling updates allow a gradual update with zero downtime as the pods are incrementally updated. Here’s an example of how one team updated their applications with zero downtime with rolling updates.

您可以在其中设置maxUnavailable (可用窗格的百分比或数量)和maxSurge字段(可选)来控制滚动更新过程。 如果实施得当,则滚动更新允许在Pod进行增量更新时以零停机时间进行逐步更新。 这是一个团队如何通过滚动更新以零停机时间更新其应用程序的示例

However, once you’ve updated your deployment to the next version, it’s not always easy to go back. You should have a plan in place to roll it back in case it breaks in production. When your pod is updated to the next version, the deployment will create a new ReplicaSet. While Kubernetes will store previous ReplicaSets (by default, it’s ten, but you could change that with spec.revisionHistoryLimit). The ReplicaSets are saved under names such as app6ff34b8374 in random order, and you won’t find a reference to the ReplicaSets in the deployment app YAML. You could find it with:

但是,一旦将部署更新到下一个版本,就很难总是回去。 您应该有一个计划,将其回滚以防生产中断。 当您的Pod更新到下一个版本时,部署将创建一个新的ReplicaSet。 尽管Kubernetes将存储以前的ReplicaSets(默认情况下为十个,但是您可以使用spec.revisionHistoryLimit更改)。 副本集以诸如app6ff34b8374名称随机存储,并且在部署应用程序YAML中找不到对副本集的引用。 您可以通过以下方式找到它:

ReplicaSet.metatada.annotation

ReplicaSet.metatada.annotation

and inspect the revision with:

并使用以下命令检查修订:

kubectl get replicaset app-6ff88c4474 -o yaml

kubectl get replicaset app-6ff88c4474 -o yaml

to find the revision number. This gets complicated because the rollout history doesn’t keep a log unless you leave a note in the YAML resource (which you could do with the — record flag:

查找修订号。 这变得很复杂,因为推出历史记录不会保留日志,除非您在YAML资源中留下注释(可以使用— record标志来完成此操作:

$kubectl rollout history deployment/appREVISION CHANGE-CAUSE1 kubectl create — filename=deployment.yaml — record=true
2 kubectl apply — filename=deployment.yaml — record=true

When you have dozens, hundreds, or even thousands of deployments all going through updates simultaneously, it’s difficult to keep track of them all at once. And if your stored revisions all contain the same regression, then your production environment is not going to be in good shape! You can read more in detail about using rolling updates in this article.

当您有数十个,数百个甚至数千个部署同时进行更新时,很难一次跟踪所有更新。 而且,如果您存储的修订全部包含相同的回归,那么您的生产环境将不会处于良好状态! 您可以在本文中详细了解有关使用滚动更新的信息

Some other problems are:

其他一些问题是:

  • Not all applications are capable of concurrently running multiple versions.

    并非所有的应用程序都能够同时运行多个版本。
  • Your cluster could run out of resources in the middle of the update, which could break the whole process.

    您的群集在更新中间可能会耗尽资源,这可能会中断整个过程。

These are all very frustrating and stressful issues to run into when in a production environment.

这些都是在生产环境中遇到的非常令人沮丧和压力大的问题。

Alternative ways to more reliably update deployments include:

更可靠地更新部署的替代方法包括:

Blue/green (red/black) deploymentWith blue/green, a full set of both the old and new instances exist simultaneously. Blue is the live version, and the new version is deployed to the green replica. When the green environment has passed its tests and verifications, a load balancer simply flips the traffic to the green, which becomes the blue environment, and the old version becomes the green version. Since we have two full versions being maintained, performing a rollback is simple — all you need to do is switch back the load balancer.

蓝色/绿色(红色/黑色)部署使用蓝色/绿色,同时存在完整的旧实例和新实例。 蓝色是活动版本,新版本已部署到绿色副本。 当绿色环境通过其测试和验证后,负载均衡器会将流量切换到绿色,绿色即为蓝色环境,而旧版本则变为绿色。 由于我们维护了两个完整版本,因此执行回滚很简单-您所需要做的就是切换回负载平衡器。

Image for post
The load balancer flips between blue and green to set the active version. From Continuous Deployment Strategies with Kubernetes .
负载平衡器在蓝色和绿色之间切换以设置活动版本。 来自 Kubernetes的持续部署策略

Additional advantages include:

其他优点包括:

  • Since we never deploy directly to production, it’s pretty low-stress when we change green to blue.

    由于我们从不直接将其部署到生产中,因此当我们将绿色更改为蓝色时,压力非常小。
  • Traffic redirection occurs immediately, so there’s no downtime.

    流量重定向立即发生,因此没有停机时间。
  • There can be extensive testing done to reflect actual production prior to the switch. (As stated before, a development environment is very different from production.)

    可以进行大量测试以反映切换之前的实际生产。 (如前所述,开发环境与生产环境非常不同。)

Kubernetes does not include blue/green deployments as one of its native toolings. You can read more about how to implement blue/green into your CI/CD automation in this tutorial.

Kubernetes不包括蓝色/绿色部署作为其本机工具之一。 在本教程中,您可以阅读有关如何在您的CI / CD自动化实现蓝色/绿色的更多信息。

Canary ReleasesCanary releases allow us to test for potential problems and meet key metrics before impacting the entire production system/user base. We “test in production” by deploying directly to the production environment, but only to a small subset of users. You can choose routing to be percentage-based or driven by region/user location, the type of client, and billing properties. Even when deploying to a small subset, it’s important to carefully monitor application performance and measure errors — these metrics define a quality threshold. If the application behaves as expected, we start transferring more of the new version instances to support more traffic.

Canary版本 Canary版本使我们能够在影响整个生产系统/用户群之前测试潜在问题并满足关键指标。 我们通过直接部署到生产环境(但仅部署到一小部分用户)来“进行生产测试”。 您可以选择基于百分比或由区域/用户位置,客户端类型和计费属性决定的路由。 即使在部署到较小的子集时,也必须仔细监视应用程序性能并测量错误,这很重要-这些指标定义了质量阈值。 如果应用程序的行为符合预期,我们将开始传输更多的新版本实例以支持更多流量。

Image for post
The load balancer gradually releases the new version into production. From Continuous Deployment Strategies with Kubernetes .
负载均衡器逐渐将新版本发布到生产中。 来自 Kubernetes的持续部署策略

Other advantages include:

其他优点包括:

  • Observability

    可观察性
  • Ability to test on production traffic (getting a true production-like experience in development is hard)

    测试生产流量的能力(很难获得真正的类似于生产的开发经验)
  • Ability to release a version to a small subset of users and get real feedback before a larger release

    能够向一小部分用户发布版本,并在大版本发布之前获得实际反馈
  • Fail fast. Since we deploy straight into production, we can fail fast (i.e., revert immediately) if it breaks, and it affects only a subset rather than the whole community.

    快速失败。 由于我们直接将其部署到生产中,因此如果发生故障,我们可能会快速失败(即立即还原),并且它只会影响一部分,而不会影响整个社区。

9.没有适当的指标来了解部署是否成功 (9. Not having metrics in place to understand if a deployment was successful or not)

Your health checks need application support.

您的健康检查需要应用程序支持。

You can leverage Kubernetes to accomplish many tasks in container orchestration:

您可以利用Kubernetes完成容器编排中的许多任务:

  • Controlling resource consumption by an application or team (namespace, CPU/mem, limits) stopping an app from consuming too many resources

    控制应用程序或团队的资源消耗(命名空间,CPU /内存,限制),阻止应用程序消耗太多资源
  • Load balancing across different app instances, moving application instances from one host to another if there is a resource shortage or if the host dies

    跨不同应用程序实例进行负载平衡,如果资源不足或主机死机,则将应用程序实例从一台主机移至另一台主机
  • Self-healing — restarting containers if they crash

    自我修复-如果容器崩溃,则重新启动
  • Automatically leveraging additional resources if a new host is added to the cluster

    如果将新主机添加到群集,则自动利用其他资源
  • And more

    和更多

So sometimes it’s easy to forget about metrics and monitoring. However, a successful deployment is not the end of your ops work. It’s better to be proactive and prepare for unexpected surprises. There are still a lot more layers to monitor, and the dynamic nature of K8s makes it tough to troubleshoot. For example, if you’re not closely watching your resource available, the automatic rescheduling of pods could cause capacity issues, and your app might crash or never deploy. This would be especially unfortunate in production, as you wouldn’t know unless someone filed a bug report or if you happened to check on it. Eep!

因此,有时很容易忘记指标和监视。 但是,成功的部署并不是您的操作工作的终点。 最好积极主动,为意外的意外做好准备。 还有更多的层要监视,而K8s的动态特性使其很难进行故障排除。 例如,如果您没有密切关注可用资源,则pod的自动重新安排可能会导致容量问题,并且您的应用可能崩溃或永远不会部署。 这在生产中尤其不幸,因为除非有人提交错误报告或您碰巧对其进行检查,否则您将不知道。 E!

Monitoring presents its own set of challenges: There are a lot of layers to watch, and there’s a need to “maintain a reasonably low maintenance burden on the engineers.” When an application running on Kubernetes hits a snag, there are many logs, data, and components to investigate, especially when there are multiple microservices involved with the issue versus in traditional monolithic architecture, where everything is output to a few logs.

监视面临着一系列挑战:需要监视很多层次,并且需要“ 保持工程师的合理低维护负担 ”。 当运行在Kubernetes上的应用遇到问题时,有很多日志,数据和组件需要研究,尤其是当涉及到多个微服务时,与传统的单片架构相比,该问题要输出到一些日志中。

Insights on your application behavior, like how an application performs, helps you continuously improve. You also need a pretty holistic view of the containers, pods, services, and the cluster as a whole. If you can identify how an application is using its resources, then you can use Kubernetes to better detect and remove bottlenecks. To get a full view of the application, you would need to use an application performance monitoring solution like Prometheus, Grafana, New Relic, or Cisco AppDynamics, among many others.

对应用程序行为的洞见,例如应用程序的性能,可以帮助您不断改进。 您还需要对容器,pod,服务和整个集群有一个整体的了解。 如果您可以确定应用程序如何使用其资源,则可以使用Kubernetes更好地检测和消除瓶颈。 要全面了解该应用程序,您需要使用诸如PrometheusGrafanaNew RelicCisco AppDynamics之类的应用程序性能监视解决方案。

Whether or not you decide to use a monitoring solution, these are the key metrics that the Kubernetes documentation recommends you track closely:

无论您是否决定使用监视解决方案,这些都是Kubernetes文档建议您密切跟踪的关键指标:

  • Running pods and their deployments

    运行Pod及其部署
  • Resource metrics: CPU, memory usage, disk I/O

    资源指标:CPU,内存使用率,磁盘I / O
  • Container-native metrics

    容器原生指标
  • Application metrics

    应用指标

10.云供应商锁定:将自己锁定在IaaS提供商的Kubernetes或无服务器计算服务中 (10. Cloud vendor lock-in: Locking yourself into an IaaS provider’s Kubernetes or serverless computing services)

There are multiple types of lock-ins (Martin Fowler wrote a great article, if you want to read more), but vendor lock-in negates the primary value of deploying to the cloud: container flexibility. It’s true that choosing the right cloud provider is not an easy decision. Each provider has its own interfaces, open APIs, and proprietary specifications and standards. Additionally, one provider might suit your needs better than the others only for your business needs to unexpectedly change.

锁定有多种类型(如果您想了解更多,Martin Fowler会写一篇很棒的文章 ),但是供应商锁定否定了部署到云的主要价值:容器灵活性。 的确,选择合适的云提供商并非易事。 每个提供商都有其自己的接口,开放的API以及专有的规范和标准。 此外,一个提供商可能比其他提供商更适合您的需求,仅是因为您的业务需求发生了意外变化。

Fortunately, containers are platform-agnostic and portable, and all the major providers have a Kubernetes foundation, which is cloud-agnostic. You don’t have to re-architect or rewrite your application code when you need to move workloads between clouds, so you shouldn’t need to lock yourself into a cloud provider because you can’t “lift and shift.”

幸运的是,容器与平台无关且可移植,并且所有主要的提供商都具有与云无关的Kubernetes基础。 当您需要在云之间移动工作负载时,您不必重新架构或重写应用程序代码,因此您不必将自己锁定在云提供商中,因为您无法“提升并转移”。

Here is a list of things you should consider to ensure you can be flexible to prevent or minimize vendor lock-in.

以下是您应考虑采取的措施,以确保可以灵活地防止或最小化供应商锁定。

First, housekeeping: Read the fine print

首先, 客房整理 :请阅读细则

Negotiate entry and exit strategies. Many vendors make it easy to start — and get you hooked. This might include incentives like free trials or credits, but these costs could rapidly increase as you scale up.

谈判进入和退出策略。 许多供应商都使启动变得容易,并且很容易上瘾。 这可能包括免费试用或赠金之类的激励措施,但是随着您扩大规模,这些费用可能会Swift增加。

Check for things like auto-renewal, early termination fees, and if the provider will help with things like deconversion when migrating to another vendor and SLAs associated with exit.

检查是否有自动续订,提早终止费用之类的内容,以及提供商在迁移到另一家供应商以及与退出相关的SLA时是否会帮助进行反转换等内容。

Architect/design your applications such that they can run on any cloud

设计/设计您的应用程序,使其可以在任何云上运行

If you’re already developing for the cloud and using cloud-native principles, then most likely your application code should be easy to lift and shift. It’s the things surrounding the code that potentially lock you into a cloud vendor. For example, you could:

如果您已经在为云开发并使用云原生原则,那么您的应用程序代码很可能应该易于提升和转移。 围绕代码的事情可能将您锁定在云供应商中。 例如,您可以:

  • Check that your services and features (like databases, APIs, etc) used by your application are portable.

    检查应用程序使用的服务和功能(例如数据库,API等)是否可移植。
  • Check if your deployment and provisioning scripts are cloud-specific. Some clouds have their own native or recommended automation tools that may not translate easily to other providers. There are many tools that can be used to assist with cloud infrastructure automation and are compatible with many of the major cloud providers, like Puppet, Ansible, and Chef, to name a few. This blog has a handy chart that compares characteristics of common tools.

    检查您的部署和配置脚本是否特定于云。 某些云具有自己的本机或推荐的自动化工具,这些工具可能无法轻松转换为其他提供商。 有很多工具可用于协助云基础架构自动化,并且与许多主要的云提供商兼容,例如PuppetAnsibleChef等。 该博客有一个方便的图表,比较了常用工具的特性。

  • Check if your DevOps environment, which typically includes Git and CI/CD, can run in any cloud. For example, many clouds have their own specific CI/CD tools, like IBM Cloud Continuous Delivery, Azure CI/CD, or AWS Pipelines, that might require extra work to port over to another cloud vendor. Instead, you could use something like Codefresh, full CI/CD solutions that have great support for Docker and Kubernetes and integrates with many other popular tools. There are also myriad other solutions, some CI or CD, or both, like GitLab, Bamboo, Jenkins, Travis, etc.

    检查您的DevOps环境(通常包括Git和CI / CD)是否可以在任何云中运行。 例如,许多云具有自己特定的CI / CD工具,例如IBM Cloud Continuous DeliveryAzure CI / CDAWS Pipelines ,可能需要额外的工作才能移植到另一个云供应商。 相反,您可以使用Codefresh之类的完整CI / CD解决方案,这些解决方案对Docker和Kubernetes具有强大的支持,并与许多其他流行工具集成。 还有许多其他解决方案,例如CI或CD,或两者都有,例如GitLabBambooJenkinsTravis等等。

  • Check if your testing process will need to be changed between providers.

    检查是否需要在提供商之间更改测试过程。

You could also choose to follow a multicloud strategy

您还可以选择遵循 多云策略

With a multicloud strategy, you can pick and choose services from different cloud providers that best the type of application(s) you are hoping to deliver. When you plan a multicloud deployment, you should keep interoperability in careful consideration.

With a multicloud strategy, you can pick and choose services from different cloud providers that best the type of application(s) you are hoping to deliver. When you plan a multicloud deployment, you should keep interoperability in careful consideration.

摘要 (Summary)

Kubernetes is really popular, but it’s hard to get started with, and there are a lot of practices in traditional development that don’t translate to cloud-native development.

Kubernetes is really popular, but it's hard to get started with, and there are a lot of practices in traditional development that don't translate to cloud-native development.

In this article, we’ve looked at:

In this article, we've looked at:

  1. Putting the configuration file inside/alongside the Docker image: Externalise your configuration data. You can use ConfigMaps and Secrets or something similar.

    Putting the configuration file inside/alongside the Docker image: Externalise your configuration data. You can use ConfigMaps and Secrets or something similar.

  2. Not using Helm or other kinds of templating: Use Helm or Kustomize to streamline your container orchestration and reduce human error.

    Not using Helm or other kinds of templating: Use Helm or Kustomize to streamline your container orchestration and reduce human error.

  3. Deploying things in a specific order: Applications shouldn’t crash because a dependency isn’t ready. Utilize Kubernetes’s self-healing mechanism and implement retries and circuit breakers.

    Deploying things in a specific order: Applications shouldn't crash because a dependency isn't ready. Utilize Kubernetes's self-healing mechanism and implement retries and circuit breakers.

  4. Deploying pods without set memory and/or CPU limits: You should consider setting memory and CPU limits to reduce the risk of resource contention, especially when sharing the cluster with others.

    Deploying pods without set memory and/or CPU limits: You should consider setting memory and CPU limits to reduce the risk of resource contention, especially when sharing the cluster with others.

  5. Pulling the latest tag in containers in production: Never use latest. Always use something meaningful, like v1.4.0/according to Semantic Versioning Specification, and employ immutable Docker images.

    Pulling the latest tag in containers in production: Never use latest . Always use something meaningful, like v1.4.0/according to Semantic Versioning Specification , and employ immutable Docker images.

  6. Deploying new updates/fixes by killing pods so they pull the new Docker images during the restart process: Version your code so you can better manage your releases.

    Deploying new updates/fixes by killing pods so they pull the new Docker images during the restart process: Version your code so you can better manage your releases.

  7. Mixing both production and non-production workloads in the same cluster: Run your production and non-production workloads in separate clusters if you can. This reduces risk to your production environment from resource contention and accidental environment cross-over.

    Mixing both production and non-production workloads in the same cluster: Run your production and non-production workloads in separate clusters if you can. This reduces risk to your production environment from resource contention and accidental environment cross-over.

  8. Not using blue/green or canaries for mission-critical deployments (the default rolling update of Kubernetes is not always enough): You should consider blue/green deployment or canary releases for less stress in production and more meaningful production results.

    Not using blue/green or canaries for mission-critical deployments (the default rolling update of Kubernetes is not always enough): You should consider blue/green deployment or canary releases for less stress in production and more meaningful production results.

  9. Not having metrics in place to understand if a deployment was successful or not (your health checks need application support): You should make sure to monitor your deployment to avoid any surprises. You could use a tool like Prometheus, Grafana, New Relic, or Cisco AppDynamics to help you gain better insights on your deployments.

    Not having metrics in place to understand if a deployment was successful or not (your health checks need application support): You should make sure to monitor your deployment to avoid any surprises. You could use a tool like Prometheus, Grafana, New Relic, or Cisco AppDynamics to help you gain better insights on your deployments.

  10. Cloud vendor lock-in: Locking yourself into an IaaS provider’s Kubernetes or serverless computing services: Your business needs could change at any time. You shouldn’t unintentionally lock yourself into a cloud provider since you can easily lift and shift cloud-native applications.

    Cloud vendor lock-in: Locking yourself into an IaaS provider's Kubernetes or serverless computing services: Your business needs could change at any time. You shouldn't unintentionally lock yourself into a cloud provider since you can easily lift and shift cloud-native applications.

Thanks for reading!

谢谢阅读!

翻译自: https://medium.com/better-programming/10-antipatterns-for-kubernetes-deployments-e97ce1199f2d

Logo

瓜分20万奖金 获得内推名额 丰厚实物奖励 易参与易上手

更多推荐