k8s——滚动更新与 Health Check
k8s——滚动更新与 Health CheckRollingUpdate回滚Health Check默认的健康检查Liveness 探测Readiness 探测Liveness 与 Readiness 的区别:在Scale Up中使用 Health Check在滚动更新中使用Health CheckRollingUpdate下面我们部署三个副本应用,初始镜像为 httpd:2.2,然后将其更新到
k8s——滚动更新与 Health Check
Rolling Update
下面我们部署三个副本应用,初始镜像为 httpd:2.2,然后将其更新到 httpd:2.4
编写 httpd:2.2 配置文件:
[root@master service]# cat httpd.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpd
labels:
run: httpd
spec:
replicas: 3
selector:
matchLabels:
run: httpd
template:
metadata:
labels:
run: httpd
spec:
containers:
- name: httpd
image: httpd:2.2
ports:
- containerPort: 80
部署并查看:
[root@master service]# kubectl apply -f httpd.yml
deployment.apps/httpd created
[root@master service]# kubectl get deployments.apps -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
httpd 3/3 3 3 19s httpd httpd:2.2 run=httpd
[root@master service]# kubectl get replicasets.apps -o wide
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
httpd-cd94c9b6d 3 3 2 69s httpd httpd:2.2 pod-template-hash=cd94c9b6d,run=httpd
将配置文件中的 httpd:2.2 替换为 htpd:2.4,再次部署:
[root@master service]# kubectl apply -f httpd.yml
deployment.apps/httpd configured
[root@master service]# kubectl get deployments.apps -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
httpd 3/3 2 3 2m34s httpd httpd:2.4 run=httpd
[root@master service]# kubectl get replicasets.apps -o wide
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
httpd-74d5cc74db 2 2 1 34s httpd httpd:2.4 pod-template-hash=74d5cc74db,run=httpd
httpd-cd94c9b6d 2 2 2 2m42s httpd httpd:2.2 pod-template-hash=cd94c9b6d,run=httpd
[root@master service]# kubectl get replicasets.apps -o wide
NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
httpd-74d5cc74db 3 3 3 52s httpd httpd:2.4 pod-template-hash=74d5cc74db,run=httpd
httpd-cd94c9b6d 0 0 0 3m httpd httpd:2.2 pod-template-hash=cd94c9b6d,run=httpd
httpd 的镜像更新为了 httpd:2.4
具体过程可以通过 kubectl describe deployments.apps httpd 查看:
[root@master service]# kubectl describe deployments.apps httpd
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 3m21s deployment-controller Scaled up replica set httpd-cd94c9b6d to 3
Normal ScalingReplicaSet 73s deployment-controller Scaled up replica set httpd-74d5cc74db to 1
Normal ScalingReplicaSet 55s deployment-controller Scaled down replica set httpd-cd94c9b6d to 2
Normal ScalingReplicaSet 55s deployment-controller Scaled up replica set httpd-74d5cc74db to 2
Normal ScalingReplicaSet 37s deployment-controller Scaled down replica set httpd-cd94c9b6d to 1
Normal ScalingReplicaSet 37s deployment-controller Scaled up replica set httpd-74d5cc74db to 3
Normal ScalingReplicaSet 34s deployment-controller Scaled down replica set httpd-cd94c9b6d to 0
回滚
kubectl apply 每次更新应用时 k8s 都会记录下当前的配置,保存为一个 revision,这样就可以回滚到某个特定的 revision。
默认配置下,k8s 只会保留最近的几个 revision,可以在 Deployment 配置文件中通过 revisionhistoryLimit 属性增加 revision 数量。
下面实践回滚功能,创建三个配置文件 http-v1.yml , http-v2.yml , http-v3.yml ,分别对应不同的 httpd 镜像 2.4.37 , 2.4.38, 2.4.39。
[root@master rollback]# cat http-v1.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpd
labels:
run: httpd
spec:
replicas: 3
selector:
matchLabels:
run: httpd
template:
metadata:
labels:
run: httpd
spec:
containers:
- name: httpd
image: httpd:2.4.37
ports:
- containerPort: 80
[root@master rollback]# cat http-v2.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpd
labels:
run: httpd
spec:
replicas: 3
selector:
matchLabels:
run: httpd
template:
metadata:
labels:
run: httpd
spec:
containers:
- name: httpd
image: httpd:2.4.38
ports:
- containerPort: 80
[root@master rollback]# cat http-v3.yml
apiVersion: apps/v1
kind: Deployment
metadata:
name: httpd
labels:
run: httpd
spec:
replicas: 3
selector:
matchLabels:
run: httpd
template:
metadata:
labels:
run: httpd
spec:
containers:
- name: httpd
image: httpd:2.4.39
ports:
- containerPort: 80
部署并更新应用:
[root@master rollback]# kubectl apply -f http-v1.yml --record
deployment.apps/httpd created
[root@master rollback]# kubectl get deployments.apps httpd -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
httpd 3/3 3 3 91s httpd httpd:2.4.37 run=httpd
[root@master rollback]# kubectl apply -f http-v2.yml --record
deployment.apps/httpd configured
[root@master rollback]# kubectl get deployments.apps httpd -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
httpd 3/3 3 3 2m13s httpd httpd:2.4.38 run=httpd
[root@master rollback]# kubectl apply -f http-v3.yml --record
deployment.apps/httpd configured
[root@master rollback]# kubectl get deployments.apps httpd -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
httpd 3/3 3 3 4m8s httpd httpd:2.4.39 run=httpd
使用 - -record 的作用是将当前命令记录到 revision 记录中
查看 revision 历史记录:
[root@master rollback]# kubectl rollout history deployment httpd
deployment.apps/httpd
REVISION CHANGE-CAUSE
1 kubectl apply --filename=http-v1.yml --record=true
2 kubectl apply --filename=http-v2.yml --record=true
3 kubectl apply --filename=http-v3.yml --record=true
CHANGE-CAUSE 就是 - -record 的结果。如果要回滚到某个版本,比如 1:
[root@master rollback]# kubectl rollout undo deployment httpd --to-revision=1
deployment.apps/httpd rolled back
[root@master rollback]# kubectl get deployments.apps httpd -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
httpd 3/3 3 3 5m42s httpd httpd:2.4.37 run=httpd
此时, revision 历史记录也会发生变化:
[root@master rollback]# kubectl rollout history deployment httpd
deployment.apps/httpd
REVISION CHANGE-CAUSE
2 kubectl apply --filename=http-v2.yml --record=true
3 kubectl apply --filename=http-v3.yml --record=true
4 kubectl apply --filename=http-v1.yml --record=true
Health Check
强大的自愈能力是 k8s 这类容器编排引擎的一个重要特性。自愈的默认实现方式是自动重启发生故障的容器,除此之外,用户还可以利用 Liveness 和 Readiness 探测机制设置更精细的健康检查,进而实现以下的需求:
1.零停机部署
2.避免部署无效的镜像
3.更加安全的滚动升级
默认的健康检查
每个容器启动是都会执行一个进程,此进程由 Dockerfile 的 CMD 或 ENTRYPOINT 指定。如果进程退出时返回码非零,则认为容器发生故障, k8s 就会根据 restartPolicy 重启容器。
下面我们模拟一个容器发生故障的场景:
[root@master exporter]# cat check.yml
apiVersion: v1
kind: Pod
metadata:
labels:
test: healthcheck1
name: healthcheck1
spec:
restartPolicy: OnFailure
containers:
- name: healthcheck1
image: busybox
args:
- /bin/sh
- -c
- sleep 10;exit 1
Pod 的 restartPolicy 设置为 OnFailure,默认为 Always。
sleep 10; exit 1 模拟容器启动 10 秒 后发生故障
过几分钟查看 Pod 的状态:
[root@master exporter]# kubectl get pod
NAME READY STATUS RESTARTS AGE
healthcheck1 1/1 Running 2 84s
可以看到容器重启了两次
在上面的例子中,容器进程返回值非零,Kubernetes 则认为容器发生故障,需要重启。但有不少情况是发生了故障,但进程并不会退出。比如访问Web服务器时显示500内部错误,可能是系统超载,也可能是资源死锁,
此时httpd进程并没有异常退出,在这种情况下重启容器可能是最直接最有效的解决方案,我们可以使用Liveness探测来处理这类场景。
Liveness 探测
liveenss 探测让用户可以自定义判断容器是否健康的条件,如果探测失败,k8s 就会重启容器。
编写配置文件:
[root@master exporter]# cat liveness.yml
apiVersion: v1
kind: Pod
metadata:
labels:
test: liveness
name: liveness
spec:
restartPolicy: OnFailure
containers:
- name: liveness
image: busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy;sleep 30;rm -rf /tmp/healthy;sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 10
periodSeconds: 5
首先创建一个文件,30秒之后删除这个文件,再次休眠 600秒
接着查看这个文件,启动10秒之后 Liveness 探测这个文件,之后每隔5秒探测一下这个文件,如果连续失败 3 次探测就会杀掉并重启容器。
查看 Pod:
[root@master exporter]# kubectl get pod
NAME READY STATUS RESTARTS AGE
liveness 1/1 Running 4 6m53s
可以看到已经重启了四次
[root@master exporter]# kubectl describe pod liveness
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 9m47s default-scheduler Successfully assigned default/liveness to node2
Normal Pulled 6m43s (x3 over 9m34s) kubelet, node2 Successfully pulled image "busybox"
Normal Created 6m43s (x3 over 9m34s) kubelet, node2 Created container liveness
Normal Started 6m43s (x3 over 9m34s) kubelet, node2 Started container liveness
Normal Killing 6m3s (x3 over 8m53s) kubelet, node2 Container liveness failed liveness probe, will be restarted
Normal Pulling 5m33s (x4 over 9m46s) kubelet, node2 Pulling image "busybox"
Warning Unhealthy 4m43s (x10 over 9m3s) kubelet, node2 Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Readiness 探测
除了liveness探测, Kubernetes Health Check 机制还包括 Readiness 探测。
用户通过 Liveness 探测可以告诉我们 Kubernetes什么时候通过重启容器实现自愈;Readiness探测则是告诉 Kubernetes 什么时候可以将容器加入 Service负载均衡中,对外提供服务。
[root@master exporter]# cat readiness.yml
apiVersion: v1
kind: Pod
metadata:
labels:
test: readiness
name: readiness
spec:
restartPolicy: OnFailure
containers:
- name: readiness
image: busybox
args:
- /bin/sh
- -c
- touch /tmp/healthy;sleep 30;rm -rf /tmp/healthy;sleep 600
livenessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 10
periodSeconds: 5
这个配置文件只是将前面例子中的 liveness 替换成了 readiness
查看 Pod:
[root@master exporter]# kubectl get pod
readiness 1/1 Running 0 27s
刚被创建时,READY 状态为不可用。
15秒后,第一次进行 Readiness 探测并成功返回,设置 READY 为可用。
30秒后,/tmp/healthy 被删除,连续 3 次 Readiness 探测失败后,READY 被设置为不可用。
查看日志:
[root@master exporter]# kubectl describe pod readiness
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 8m6s default-scheduler Successfully assigned default/readiness to node2
Normal Pulled 4m48s (x3 over 7m50s) kubelet, node2 Successfully pulled image "busybox"
Normal Created 4m48s (x3 over 7m50s) kubelet, node2 Created container readiness
Normal Started 4m48s (x3 over 7m49s) kubelet, node2 Started container readiness
Normal Killing 4m5s (x3 over 7m5s) kubelet, node2 Container readiness failed liveness probe, will be restarted
Normal Pulling 3m35s (x4 over 8m5s) kubelet, node2 Pulling image "busybox"
Warning Unhealthy 2m35s (x10 over 7m15s) kubelet, node2 Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
Liveness 与 Readiness 的区别:
Liveness 探测和 Readiness 探测是两种 Health Check 机制,如果不特意配置,Kubernetes 将对两种探测采取相同的默认行为,即通过判断容器启动进程的返回值是否为零来判断探测是否成功。
两种探测的配置方法完全一样, 支持的配置参数也一样。不同之处在于探测失败后的行为: Liveness 探测是重启容器; Readiness 探测则是将容器设置为不可用,不接收 Service 转发的请求。
Liveness 探测和 Readiness 探测是独立执行的,二者之间没有依赖,所以可以单独使用,也可以同时使用。用 Liveness 探测判断容器是否需要重启以实现自愈;用 Readiness 探测判断容器是否已经准备好对外提供服务。
在Scale Up中使用 Health Check
对于多副本应用,当执行Scale Up操作时,新副本会作为backend被添加到Service的负责均衡中,与已有副本一起处理客户的请求。考虑到应用启动通常都需要一个准备阶段, 比如加载缓存数据,连接数据库等,从容器启动到正真能够提供服务是需要段时间的。 我们可以通过Readiness探测判断容器是否就绪,避免将请求发送到还没有ready的backend。
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: web
name: web
spec:
replicas: 3
selector:
matchLabels:
app: web
template:
metadata:
labels:
app: web
spec:
containers:
- image: httpd
name: httpd
ports:
- containerPort: 80
readinessProbe:
httpGet:
scheme: HTTP
path: /health
port: 80
initialDelaySeconds: 10
periodSeconds: 5
---
apiVersion: v1
kind: Service
metadata:
name: httpd-svc
spec:
type: NodePort
selector:
app: web
ports:
- protocol: TCP
nodePort: 30000
port: 8080
targetPort: 80
多个 apiVersion 之间用 - - - 分隔
readinessProbe 部分我们使用了不同于 exec 的另一种探测方法 --httpGet。Kubernetes 对于该方法探测成功的判断天剑是http请求的返回代码在 200-400 之间。
schema指定协议,支持http和https
path指定访问路径
port指定端口
容器启动 10 秒后开始探测,如果 http://IP:8080/healthy 返回代码不是 200-400,表示容器没有就绪,不接受 Service web-svc 的请求
每隔五秒在探测一次,直到返回代码为200-400,表明容器已经就绪,然后将其加入到web-svc的负载均衡中,开始处理客户请求。
探测会继续以5秒的间隔执行,如果连续发生3次失败,容器又会从负载均衡中移除,直到下次探测成功重新加入。
在滚动更新中使用Health Check
Health Check另一个重要的应用场景是Rolling Update。试想下下面的情况
现有一个正常运行的多副本应用,接下来对应用进行更新(比如使用更高版本的image),Kubernetes 会启动新副本,然后发生了如下事件:
正常情况下新副本需要10秒钟完成准备工作,在此之前无法响应业务请求。
但由于人为配置错误,副本始终无法完成准备工作(比如无法连接后端数据库)。
如果没有配置Health Check,会出现怎样的情况?
因为新副本本身没有异常退出,默认的Health Check机制会认为容器已经就绪,进而会逐步用新副本替换现有副本,其结果就是:当所有旧副本都被替换后,整个应用将无法处理请求,无法对外提供服务。如果这是发生在
重要的生产系统上,后果会非常严重。
如果正确配置了Health Check,新副本只有通过了Readiness探测,才会被添加到Service;如果没有通过探测,现有副本不会被全部替换,业务仍然正常进行。
下面通过例子来实践Health Check在Rolling Update中的应用。
用如下配置文件app.v1.yml模拟一个10副本的应用:
[root@master exporter]# cat app.v1.yml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: app
name: app
spec:
replicas: 10
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- image: busybox
name: app
args:
- /bin/sh
- -c
- sleep 10;touch /tmp/healthy;sleep 30000
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 10
10 秒后副本能够通过 Readiness 探测
[root@master exporter]# kubectl get pod
NAME READY STATUS RESTARTS AGE
app-65c75dcb4b-522md 1/1 Running 0 2m5s
app-65c75dcb4b-cjb8c 1/1 Running 0 2m5s
app-65c75dcb4b-fjvpj 1/1 Running 0 2m5s
app-65c75dcb4b-g5mvn 1/1 Running 0 2m5s
app-65c75dcb4b-jbrrq 1/1 Running 0 2m5s
app-65c75dcb4b-qkng9 1/1 Running 0 2m5s
app-65c75dcb4b-r5rt7 1/1 Running 0 2m5s
app-65c75dcb4b-r6rsk 1/1 Running 0 2m5s
app-65c75dcb4b-vvc2k 1/1 Running 0 2m5s
app-65c75dcb4b-zn9f9 1/1 Running 0 2m5s
[root@master exporter]# kubectl get deployments.apps app
NAME READY UP-TO-DATE AVAILABLE AGE
app 10/10 10 10 2m18s
接下来滚动更新应用,配置文件 app.v2.yml:
[root@master exporter]# cat app.v2.yml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: app
name: app
spec:
replicas: 10
selector:
matchLabels:
app: app
template:
metadata:
labels:
app: app
spec:
containers:
- image: busybox
name: app
args:
- /bin/sh
- -c
- sleep 30000
readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 10
periodSeconds: 5
由于新副本不存在 /tmp/healthy,是无法通过 Readiness 探测的
[root@master exporter]# kubectl get deployments.apps
NAME READY UP-TO-DATE AVAILABLE AGE
app 8/10 5 8 6m16s
[root@master exporter]# kubectl get pod
NAME READY STATUS RESTARTS AGE
app-5f554b4fb9-9vt5d 0/1 Running 0 3m26s
app-5f554b4fb9-fv87x 0/1 Running 0 3m26s
app-5f554b4fb9-rrvnq 0/1 Running 0 3m26s
app-5f554b4fb9-szw8x 0/1 Running 0 3m26s
app-5f554b4fb9-xxhfh 0/1 Running 0 3m26s
app-65c75dcb4b-522md 1/1 Running 0 7m42s
app-65c75dcb4b-cjb8c 1/1 Running 0 7m42s
app-65c75dcb4b-fjvpj 1/1 Running 0 7m42s
app-65c75dcb4b-g5mvn 1/1 Running 0 7m42s
app-65c75dcb4b-r5rt7 1/1 Running 0 7m42s
app-65c75dcb4b-r6rsk 1/1 Running 0 7m42s
app-65c75dcb4b-vvc2k 1/1 Running 0 7m42s
app-65c75dcb4b-zn9f9 1/1 Running 0 7m42s
从 Pod 的 AGE 可判断,最前面 5 个 Pod 是新副本,目前处于 NOT READY 状态
旧副本从最初10个减少到8个。
再来看kubectl get deployment app的输出:
DESIRED 10表示期望的状态是10个READY的副本。
CURRENT 13表示当前副本的总数:即8个旧副本+ 5个新副本。
UP-TO-DATE 5表示当前已经完成更新的副本数:即5个新副本。
AVAILABLE 8表示当前处于READY状态的副本数:即8个旧副本。
在我们的设定中,新副本始终都无法通过Readiness探测,所以这个状态会直保持下去。
上面我们模拟了一个滚动更新失败的场景。不过幸运的是: Health Check帮我们屏蔽了有缺陷的副本,同时保留了大部分旧副本,业务没有因更新失败受到影响。
接下来我们要回答:为什么新创建的副本数是5个,同时只销毁了2个旧副本?
原因是:滚动更新通过参数maxSurge和maxUnavailable来控制副本替换的数量。
更多推荐
所有评论(0)