【Kubernetes】k8s的健康性检查详细说明【livenss probe、readiness probe】

文章目录探测的目的环境准备什么是 Container Probes健康状态说明yaml配置文件参数说明perbe功能前测试liveness probecommandhttpGettcpSocketreadiness probecommandhttpGettcpSocket探测的目的deployment的作用用来维持 pod的健壮性当pod挂掉之后，deployment会生成新的pod但如果pod

/*守护她的笑容

7232人浏览 · 2021-09-06 18:15:30

/*守护她的笑容 · 2021-09-06 18:15:30 发布

探测的目的

deployment的作用用来维持 pod的健壮性
当pod挂掉之后，deployment会生成新的pod
但如果pod是正常运行的，但pod里面出了问题，此时deployment是监测不到的。
故此需要探测（probe）
用户定义 “出现什么样的状况 “才叫出问题
当probe监测到此问题，会认为pod出现了问题，执行“重启大法”来解决问题。
我们都知道Kubernetes会维持Pod的状态及个数，因此如果你只是希望保持Pod内容器失败后能够重启，那么其实没有必要添加健康检查，只需要合理配置Pod的重启策略即可。更适合健康检查的场景是在我们根据检查结果需要主动杀掉容器并重启的场景，还有一些容器在正式提供服务之前需要加载一些数据，那么可以采用readiness来检查这些动作是否完成。

环境准备

首先需要有一套集群

[root@master ~]# kubectl get nodes
NAME     STATUS   ROLES    AGE   VERSION
master   Ready    master   43d   v1.21.0
node1    Ready    <none>   43d   v1.21.0
node2    Ready    <none>   43d   v1.21.0
[root@master ~]#

然后我们创建一个文件用来放后面的测试文件，创建一个命名空间，后面测试都在这个命名空间做

[root@master ~]# mkdir probe
[root@master ~]# cd probe
[root@master probe]# kubectl create ns probe
namespace/probe created
[root@master probe]# kubens probe
Context "context" modified.
Active namespace is "probe".
[root@master probe]# kubectl get pods
No resources found in probe namespace.
[root@master probe]#

什么是 Container Probes

官方文档，模版也可以在这里面看到：
livenss
Kubernetes的架构中，每个Node节点上都有 kubelet ，Container Probe 也就是容器的健康检查是由 kubelet 定期执行的。

健康状态说明

Kubelet通过调用Pod中容器的Handler来执行检查的动作，Handler有三种类型。
- ExecAction，在容器中执行特定的命令，命令退出返回0表示成功
- TCPSocketAction，根据容器IP地址及特定的端口进行TCP检查，端口开放表示成功
- HTTPGetAction，根据容器IP、端口及访问路径发起一次HTTP请求，如果返回码在200到400之间表示成功
每种检查动作都可能有三种返回状态。
- Success，表示通过了健康检查
- Failure，表示没有通过健康检查
- Unknown，表示检查动作失败
在创建Pod时，可以通过liveness和readiness两种方式来探测Pod内容器的运行情况
- liveness可以用来检查容器内应用的存活的情况来，如果检查失败会杀掉容器进程，是否重启容器则取决于Pod的重启策略。
- readiness检查容器内的应用是否能够正常对外提供服务，如果探测失败，则Endpoint Controller会将这个Pod的IP从服务中删除。

yaml配置文件参数说明

后面配置文件中的参数如果有看不懂的，可以来下面自行对应哦，我就不在每个配置文件中注释代码意思了。
initialDelaySeconds：容器启动后第一次执行探测是需要等待多少秒。
periodSeconds：执行探测的频率，默认是10秒，最小1秒。
timeoutSeconds：探测超时时间，默认1秒，最小1秒。
successThreshold：探测失败后，最少连续探测成功多少次才被认定为成功，默认是1，对于liveness必须
是1，最小值是1。
failureThreshold：当 Pod 启动了并且探测到失败，- Kubernetes 的重试次数。存活探测情况下的放弃就意味着重新启动容器。就绪探测情况下的放弃 Pod 会被打上未就绪的标签。默认值是 3。最小值是 1
httpGet的属性
- host：主机名或IP
- scheme：链接类型，HTTP或HTTPS，默认为HTTP
- path：请求路径
- httpHeaders：自定义请求头
- port：请求端口

perbe功能前测试

下面代码来源于上面官网中，增加了2行内容【立即删除和镜像拉取策略】和修改了镜像名称

# node节点需要有busybox镜像
[root@node1 ~]# docker images | grep busybox
busybox                                                           latest     69593048aa3a   3 months ago    1.24MB
[root@node1 ~]# 


# master上的代码内容
[root@master probe]# cat pod1.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-exec
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: liveness
    image: busybox
    imagePullPolicy: IfNotPresent
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    #livenessProbe:
    #  exec:
    #    command:
    #    - cat
    #    - /tmp/healthy
    #  initialDelaySeconds: 5
    #  periodSeconds: 5
[root@master probe]#

测试流程：
- 1、创建/tmp/healthy
- 2、睡眠30秒
- 3、删除/tmp/healthy
- 4、睡眠600秒
- 5、容器失效，测试结束

[root@master probe]# kubectl apply -f pod1.yaml
pod/liveness-exec created
[root@master probe]# 
[root@master probe]# kubectl get pods
NAME            READY   STATUS              RESTARTS   AGE
liveness-exec   0/1     ContainerCreating   0          4s
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-exec   1/1     Running   0          8s
[root@master probe]# kubectl exec -it liveness-exec -- ls /tmp
healthy
[root@master probe]# 
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-exec   1/1     Running   0          27s
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-exec   1/1     Running   0          31s
[root@master probe]# kubectl exec -it liveness-exec -- ls /tmp
[root@master probe]#

测试上面的意义就是，文件没了就没了，下面定义prebe以后，文件没了，可以自动重新给我们生成罢了【重新生成也就是重启pod，回到pod的初始状态】
kubectl describe pod pod_name可以查看重启过程哦

liveness probe

通过重启来解决问题—重启大法–所谓的重启，就是删除这个pod创建同名的pod【和手动的区别就是，这个创建时间不会重头开始，会累计，只能看到restarts的数量在增多】

command

在容器生命的前30秒，有一个/tmp/healthy档案。所以在前30秒，命令cat /tmp/healthy返回成功代码。30秒后，cat /tmp/healthy返回失败代码【5秒左右检测】，所以35+30删除等待，大概75秒左右容器会重启。
所以流程就是，/tmp文件在30秒左右存在，30秒以后文件就没了，75秒以后，容器重启，文件又存在

[root@master probe]# cat pod2.yaml
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: podliveness2
spec:
  #terminationGracePeriodSeconds: 0
  containers:
  - name: liveness
    image: busybox
    imagePullPolicy: IfNotPresent
    args:
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600
    livenessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5 # 容器启动的5s内不检测
      periodSeconds: 5  # 每5s检测一次
[root@master probe]#

开始测试，注意看下面注释说明

# 创建并观察容器时间和tmp文件，均正常
[root@master probe]# kubectl apply -f pod2.yaml 
pod/podliveness2 created
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   0          2s
[root@master probe]# kubectl exec -it podliveness2 -- ls /tmp
healthy
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   0          17s
[root@master probe]# kubectl exec -it podliveness2 -- ls /tmp
healthy
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   0          26s
[root@master probe]# kubectl exec -it podliveness2 -- ls /tmp
healthy
[root@master probe]# 

# 30秒以后，tmp文件小时，等待75秒以后，容器重启
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   0          33s
[root@master probe]# kubectl exec -it podliveness2 -- ls /tmp
[root@master probe]# 
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   0          42s
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   0          54s
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   0          65s
[root@master probe]# kubectl exec -it podliveness2 -- ls /tmp
[root@master probe]# 
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   0          73s
[root@master probe]#

# 上面可以看到tmp文件依然不存在的，75秒到了，容器RESTARTS次数变成1，已经重启了，此时文件也该存在了
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   1          76s
[root@master probe]# kubectl exec -it podliveness2 -- ls /tmp
healthy
[root@master probe]# 
# 后面其实又开始进入一个循环，容器重启tmp文件持续30秒然后被删除
#等待75秒左右容器重启tmp文件又存在，时间会累计，只是重启次数会增加。
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   7          10m
[root@master probe]# 
[root@master probe]# kubectl exec -it podliveness2 -- ls /tmp
healthy
[root@master probe]# 
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   7          10m
[root@master probe]# kubectl exec -it podliveness2 -- ls /tmp
healthy
[root@master probe]# kubectl exec -it podliveness2 -- ls /tmp
[root@master probe]# 
[root@master probe]# kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
podliveness2   1/1     Running   7          10m
[root@master probe]#

httpGet

代码如下【使用ngxinx说明】：

[root@master probe]# cat pod3.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: liveness-http
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: liveness
    image: nginx
    imagePullPolicy: IfNotPresent
    livenessProbe:
      failureThreshold: 3
      httpGet:
        # 下面的/不是系统的根，而是nginx的根/usr/share/nginx/html
        path: /index.html
        port: 80
        scheme: HTTP
      initialDelaySeconds: 10
      periodSeconds: 10
      successThreshold: 1
      timeoutSeconds: 10
[root@master probe]#

【下面的说明来源于官网，有修改】在配置文件中，您可以看到Pod只有一个容器。这个periodSeconds字段指定kubelet应该每3秒执行一次活性探测。这个initialDelaySeconds字段告诉kubelet，在执行第一个探测之前，它应该等待10秒。为了执行探测，kubelet向运行在容器中并侦听端口80的服务器发送一个HTTPGET请求。如果服务器的处理程序/index.html返回一个成功的代码，kubelet认为容器是活的和健康的。如果处理程序返回一个失败代码，kubelet将杀死容器并重新启动它。
- 任何大于或等于200或小于400的代码都表示成功。任何其他代码都表示失败。
- 这可以看到服务器的源代码:Server.go
  在容器处于活动状态的前10秒，/healthz处理程序返回状态为200。之后，处理程序返回状态为500。

http.HandleFunc("/healthz", func(w http.ResponseWriter, r *http.Request) {
    duration := time.Now().Sub(started)
    if duration.Seconds() > 10 {
        w.WriteHeader(500)
        w.Write([]byte(fmt.Sprintf("error: %v", duration.Seconds())))
    } else {
        w.WriteHeader(200)
        w.Write([]byte("ok"))
    }
})

在容器启动3秒后，kubelet开始执行健康检查。所以第一批健康检查将会成功。但是10秒后，健康检查将失败，kubelet将杀死并重新启动容器。
测试过程如下

# 创建并验证，现在一切正常
[root@master probe]# kubectl apply -f pod3.yaml 
pod/liveness-http created
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   0          6s
[root@master probe]# 
[root@master probe]# kubectl exec -it  liveness-http -- ls /usr/share/nginx/html/index.html
/usr/share/nginx/html/index.html
[root@master probe]# 

# 我们现在删除这个文件
[root@master probe]# kubectl exec -it  liveness-http -- rm /usr/share/nginx/html/index.html
[root@master probe]# 
[root@master probe]# kubectl exec -it  liveness-http -- ls /usr/share/nginx/html/index.html
ls: cannot access '/usr/share/nginx/html/index.html': No such file or directory
command terminated with exit code 2
[root@master probe]# 

# 开始等待检测，10秒检测一次，因为我添加了0秒删除，所以这个应该10几秒就会重新创建一次
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   0          30s
[root@master probe]# 
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   0          34s
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   0          40s

# 重新创建以后，该文件又会自动生成了
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   0          46s
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   1          51s
[root@master probe]# 
[root@master probe]# kubectl exec -it  liveness-http -- ls /usr/share/nginx/html/index.html
/usr/share/nginx/html/index.html
[root@master probe]# 
[root@master probe]#

# 会一直检测正常情况呢，这个容器一直正常运行，我们删一次这个文件，pod就会被重启一次【因为检测】
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   1          4m54s
[root@master probe]# kubectl exec -it  liveness-http -- rm /usr/share/nginx/html/index.html
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   1          5m8s
[root@master probe]# kubectl get pods
NAME            READY   STATUS    RESTARTS   AGE
liveness-http   1/1     Running   2          8m45s
[root@master probe]#

查看Pod事件以验证活动探测是否失败，容器是否已重新启动：kubectl describe pod pod_name

[root@master probe]# kubectl describe pod liveness-http | tail -n 20
Volumes:
  kube-api-access-8h26g:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  3m10s                  default-scheduler  Successfully assigned probe/liveness-http to node2
  Normal   Pulled     2m21s (x2 over 3m10s)  kubelet            Container image "nginx" already present on machine
  Normal   Created    2m21s (x2 over 3m10s)  kubelet            Created container liveness
  Warning  Unhealthy  2m21s (x3 over 2m41s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 404
  Normal   Killing    2m21s                  kubelet            Container liveness failed liveness probe, will be restarted
  Normal   Started    2m20s (x2 over 3m9s)   kubelet            Started container liveness
[root@master probe]#

tcpSocket

主要还是使用上面2种方法吧，下面tcp代码如下，感兴趣的可以自行研究一下。
这种方式通过TCP连接来判断是否存活，Pod编排示例。

apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
    app: node
  name: liveness-tcp
spec:
  containers:
  - name: goproxy
    image: docker.io/googlecontainer/goproxy:0.1
    ports:
    - containerPort: 8080
    readinessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
    livenessProbe:
      tcpSocket:
        port: 8080
      initialDelaySeconds: 15
      periodSeconds: 20

readiness probe

是不重启的，只是用户发送过来的请求不再转发到此pod上了【不重启，错误就一直错误了啊】
readiness配置方式和liveness类似，只要修改livenessProbe改为readinessProbe即可。
们可以通过kubectl explain命令来查看具体的配置属性，在这里还是简单列一下主要的属性。

command

代码如下【使用ngxinx说明】：

[root@master probe]# cat pod4.yaml 
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: liveness
  name: pod4
spec:
  terminationGracePeriodSeconds: 0
  containers:
  - name: liveness
    image: nginx
    imagePullPolicy: IfNotPresent
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh","-c","touch /tmp/healthy"]
    readinessProbe:
      exec:
        command:
        - cat
        - /tmp/healthy
      initialDelaySeconds: 5
      periodSeconds: 5
[root@master probe]#

通过上面配置文件创建3个pod出来【此时的pod标签都是一样的哦】

[root@master probe]# kubectl apply -f pod4.yaml 
pod/pod4 created
[root@master probe]# sed 's/pod4/pod5/' pod4.yaml | kubectl apply -f -
pod/pod5 created
[root@master probe]# sed 's/pod4/pod6/' pod4.yaml | kubectl apply -f -
pod/pod6 created
[root@master probe]# 
[root@master probe]# kubectl get pods
NAME   READY   STATUS    RESTARTS   AGE
pod4   1/1     Running   0          26s
pod5   1/1     Running   0          16s
pod6   1/1     Running   0          12s
[root@master probe]# kubectl get pods --show-labels 
NAME   READY   STATUS    RESTARTS   AGE   LABELS
pod4   1/1     Running   0          32s   test=liveness
pod5   1/1     Running   0          22s   test=liveness
pod6   1/1     Running   0          18s   test=liveness
[root@master probe]#

给每个pod随便写入内容，以区分不同pod，然后创建一个svc

[root@master probe]# kubectl exec -it pod4 -- sh -c "echo 111 > /usr/share/nginx/html/index.html"
[root@master probe]# kubectl exec -it pod5 -- sh -c "echo 222 > /usr/share/nginx/html/index.html"
[root@master probe]# kubectl exec -it pod6 -- sh -c "echo 333 > /usr/share/nginx/html/index.html"
[root@master probe]# 
[root@master probe]# kubectl expose --name=svc1 pod pod4 --port=80
service/svc1 exposed
[root@master probe]# 
[root@master probe]# kubectl get svc -o wide
NAME   TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)   AGE   SELECTOR
svc1   ClusterIP   10.96.190.13   <none>        80/TCP    9s    test=liveness
[root@master probe]#

此时，我们使用一个循环访问这个地址，可以看到3个容器的内容都会随机出现【ip是上面的svc ip】

[root@master ~]# while true ; do curl -s 10.96.190.13; sleep 1; done
333
333
222
111
333
111
333
111
111
111
333
333
222
111
222
333
222
111
^C
[root@master ~]#

这时候呢，我们删除掉pod6的文件tmp文件【删除这个文件是因为，容器是否健康我们定义的tmp文件】

[root@master probe]# kubectl exec -it pod6 -- rm /tmp/healthy
[root@master probe]# 

# 删除以后呢，容器依然为Running，但READY会变成0/1了
[root@master probe]# kubectl get pods
NAME   READY   STATUS    RESTARTS   AGE
pod4   1/1     Running   0          11m
pod5   1/1     Running   0          11m
pod6   1/1     Running   0          11m
[root@master probe]# kubectl get pods
NAME   READY   STATUS    RESTARTS   AGE
pod4   1/1     Running   0          12m
pod5   1/1     Running   0          12m
pod6   0/1     Running   0          12m
[root@master probe]# 

# 并且这个时候容器依然是可以进入的
[root@master probe]# kubectl exec -it pod6 -- bash
root@pod6:/# sleep 1
root@pod6:/# exit
exit

# 可以查看事件，可以看到文件已经无了
[root@master probe]# kubectl get ev | tail -n 10
15m         Normal    Scheduled   pod/pod6            Successfully assigned probe/pod6 to node2
15m         Normal    Pulled      pod/pod6            Container image "nginx" already present on machine
15m         Normal    Created     pod/pod6            Created container liveness
15m         Normal    Started     pod/pod6            Started container liveness
13m         Normal    Killing     pod/pod6            Stopping container liveness
12m         Normal    Scheduled   pod/pod6            Successfully assigned probe/pod6 to node2
12m         Normal    Pulled      pod/pod6            Container image "nginx" already present on machine
12m         Normal    Created     pod/pod6            Created container liveness
12m         Normal    Started     pod/pod6            Started container liveness
1s          Warning   Unhealthy   pod/pod6            Readiness probe failed: cat: /tmp/healthy: No such file or directory
[root@master probe]#

再次访问测试呢，就可以看到pod6的333已经没了【因为状态已经不正常了，所以就给剔除了】

root@master ~]# while true ; do curl -s 10.96.190.13; sleep 1; done
222
111
222
222
111
222
222
111
222
111
222
222
111
111
222
111
222
222
^C
[root@master ~]#

httpGet

使用方式和上面liveness一样，使用方式参考上面liveness即可

tcpSocket

使用方式和上面liveness一样，使用方式参考上面liveness即可

K8S/Kubernetes

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐

【深度】阿里巴巴万级规模 K8s 集群全局高可用体系之美

作者 | 韩堂、柘远、沉醉来源 | 阿里巴巴云原生公众号前言台湾作家林清玄在接受记者采访的时候，如此评价自己 30 多年写作生涯：“第一个十年我才华横溢，‘贼光闪现’，令周边黯然失色；第二个十年，我终于‘宝光现形’，不再去抢风头，反而与身边的美丽相得益彰；进入第三个十年，繁华落尽见真醇，我进入了‘醇光初现’的阶段，真正体味到了境界之美”。长夜有穷，真水无香。领略过了 K8s“身在江

K8S/Kubernetes

如何基于 K8s 构建下一代 DevOps 平台？

作者 | 孙健波（天元）导读：当前云原生 DevOps 体系现状如何？面临哪些挑战？如何通过 OAM 解决云原生 DevOps 场景下的诸多问题？云原生开发应用模型 OAM(Open Application Model) 社区核心成员孙健波将为大家一一解答，并分享如何基于 OAM 和 Kubernetes 打造无限能力的下一代 DevOps 平台。什么是 DevOps？为什么基于 Kub