【Kubernetes】第十三篇 - 服务探针的实现

上一篇，介绍了 k8s 服务探针；本篇，介绍 k8s 服务探针的实现；通过在 Pod 容器中，执行预定的 Shell 脚本命令，如果所执行的命令没有报错退出（返回值为0），代表容器状态健康；否则表示有问题的；使用 TCP 套接字进行检测；Kubernetes 会尝试在 Pod 内与指定端口连接；如果能建立连接（Pod的端口打开了），就代表当前容器是健康的；如果不能，则代表这个 Pod 有问题；ng

BraveWangDev

471人浏览 · 2023-02-28 08:53:49

BraveWangDev · 2023-02-28 08:53:49 发布

一，前言

上一篇，介绍了 k8s 服务探针；

本篇，介绍 k8s 服务探针的实现；

二，ExecAction

1，介绍

通过在 Pod 容器中，执行预定的 Shell 脚本命令，
如果所执行的命令没有报错退出（返回值为0），代表容器状态健康；否则表示有问题的；

2，实现

1）创建配置文件

shell-probe.yaml

vi shell-probe.yaml

// 脚本内容
apiVersion: v1
kind: Pod    #pod类型
metadata:
  labels:    #标签
    test: shell-probe
  name: shell-probe #名字
spec: #规格说明
  containers: #容器
  - name: shell-probe #容器名称
    image: registry.aliyuncs.com/google_containers/busybox #指定镜像
    args: #需要传给容器的参数
    - /bin/sh
    - -c
    - touch /tmp/healthy; sleep 30; rm -rf /tmp/healthy; sleep 600 #创建文件睡30秒后删除再睡600秒
    livenessProbe: #存活探针
      exec: #执行脚本
        command: #命令
        - cat #查看
        - /tmp/healthy
      initialDelaySeconds: 5 #等待容器启动完成后多久才执行检测
      periodSeconds: 5 #检测间隔时间

为容器配置存活探针：为了判断容器是否存活，尝试执行一个脚本：cat /tmp/healthy

如果文件存在，说明容器存活；
如果文件不存在，说明容器已死；

说明：

由于一开始文件就已经创建了，所以在刚开始时探测必是存在的；

设计在 30 秒之后删除，所以配置容器启动完成之后 5 秒再进行检测；

备注：探针是为容器进行配置的，不同容器的探针可能不一样

2）应用配置

应用配置启动容器

[root@k8s-master deployment]# kubectl apply -f shell-probe.yaml
pod/shell-probe created

[root@k8s-master deployment]# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
shell-probe               1/1     Running   0          25s
user-v1-8cc9f4fb5-52hmd   1/1     Running   0          17h
user-v1-8cc9f4fb5-6l7mz   1/1     Running   0          17h
user-v1-8cc9f4fb5-zqj2l   1/1     Running   0          17h

说明：

容器启动后，每间隔 5 秒就会进行一次探测，30 秒之后文件被删除，探针就会失败，这将导致容器重启；所以，根据我们的配置，容器每隔 30 秒会重启一次；

3）查看 pod 信息

过一会儿再查看 pod，已经重启了 4 次

[root@k8s-master deployment]# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
shell-probe               1/1     Running   4          5m6s
user-v1-8cc9f4fb5-52hmd   1/1     Running   0          17h
user-v1-8cc9f4fb5-6l7mz   1/1     Running   0          17h
user-v1-8cc9f4fb5-zqj2l   1/1     Running   0          17h

4）查看容器详情

[root@k8s-master deployment]# kubectl describe pods shell-probe

Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  5m25s                  default-scheduler  Successfully assigned default/shell-probe to k8s-node
  Normal   Pulled     5m23s                  kubelet            Successfully pulled image "registry.aliyuncs.com/google_containers/busybox" in 1.454670365s
  Normal   Pulled     4m8s                   kubelet            Successfully pulled image "registry.aliyuncs.com/google_containers/busybox" in 465.285971ms
  Normal   Pulled     2m53s                  kubelet            Successfully pulled image "registry.aliyuncs.com/google_containers/busybox" in 510.497057ms
  Normal   Created    2m52s (x3 over 5m23s)  kubelet            Created container shell-probe
  Normal   Started    2m52s (x3 over 5m23s)  kubelet            Started container shell-probe
  Warning  Unhealthy  2m8s (x9 over 4m48s)   kubelet            Liveness probe failed: cat: can't open '/tmp/healthy': No such file or directory
  Normal   Killing    2m8s (x3 over 4m38s)   kubelet            Container shell-probe failed liveness probe, will be restarted
  Normal   Pulling    23s (x5 over 5m24s)    kubelet            Pulling image "registry.aliyuncs.com/google_containers/busybox"

第 12 行：
由于没有找到文件 /tmp/healthy，存活探针检测失败；
容器的存活探针检测失败了，容器将会被重启；

三、TCPSocketAction

1，介绍

使用 TCP 套接字进行检测；Kubernetes 会尝试在 Pod 内与指定端口连接；
如果能建立连接（Pod的端口打开了），就代表当前容器是健康的；如果不能，则代表这个 Pod 有问题；

比如：

nginx 服务，需要看 80 端口是否正常；
mysql 服务，需要看 3306 端口是否正常；

2，实现

1）创建配置文件

tcp-probe.yaml

vi tcp-probe.yaml

// 脚本内容
apiVersion: v1
kind: Pod
metadata:
  name: tcp-probe
  labels:
    app: tcp-probe
spec:
  containers:
  - name: tcp-probe
    image: nginx
    ports:
    - containerPort: 80
    readinessProbe: #可读
      tcpSocket:
        port: 80
      initialDelaySeconds: 5
      periodSeconds: 5

2）应用配置

[root@k8s-master deployment]# kubectl apply -f tcp-probe.yaml 
pod/tcp-probe created

[root@k8s-master deployment]# kubectl get pods
NAME                      READY   STATUS             RESTARTS   AGE
shell-probe               0/1     CrashLoopBackOff   7          17m
tcp-probe                 1/1     Running            0          16s
user-v1-8cc9f4fb5-52hmd   1/1     Running            0          17h
user-v1-8cc9f4fb5-6l7mz   1/1     Running            0          17h
user-v1-8cc9f4fb5-zqj2l   1/1     Running            0          17h

// kubectl get pods | grep tcp-probe

3）查看 pod 详情

[root@k8s-master deployment]#  kubectl describe pods tcp-probe

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  106s  default-scheduler  Successfully assigned default/tcp-probe to k8s-node
  Normal  Pulling    106s  kubelet            Pulling image "nginx"
  Normal  Pulled     104s  kubelet            Successfully pulled image "nginx" in 1.480773049s
  Normal  Created    104s  kubelet            Created container tcp-probe
  Normal  Started    104s  kubelet            Started container tcp-probe

方案一，停掉 nginx 服务

现在每隔 5 秒钟，就会检测一个 80 端口是否正常

如果要测试探针，可以让 nginx 停掉即可：可以进入到容器中

[root@k8s-master deployment]# kubectl exec -it tcp-probe  -- bash

root@tcp-probe:/# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful

root@tcp-probe:/# nginx -s stop
2021/12/25 03:59:44 [notice] 48#48: signal process started
root@tcp-probe:/# command terminated with exit code 137

为容器 tcp-probe 安装 apt-get，安装 ps 工具：

apt-get update
apt-get install procps
ps

// 实际执行
root@tcp-probe:/# apt-get update
Get:1 http://security.debian.org/debian-security bullseye-security InRelease [44.1 kB]                 
Get:2 http://deb.debian.org/debian bullseye InRelease [116 kB]                                                 
Get:3 http://security.debian.org/debian-security bullseye-security/main amd64 Packages [102 kB]                
Get:4 http://deb.debian.org/debian bullseye-updates InRelease [39.4 kB]                                        
Get:5 http://deb.debian.org/debian bullseye/main amd64 Packages [8183 kB]                                      
Get:6 http://deb.debian.org/debian bullseye-updates/main amd64 Packages [2592 B]                               
Fetched 8487 kB in 4min 19s (32.8 kB/s)                                                                        
Reading package lists... Done
root@tcp-probe:/# apt-get install procps
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
The following additional packages will be installed:
  libgpm2 libncurses6 libncursesw6 libprocps8 psmisc
Suggested packages:
  gpm
The following NEW packages will be installed:
  libgpm2 libncurses6 libncursesw6 libprocps8 procps psmisc
0 upgraded, 6 newly installed, 0 to remove and 0 not upgraded.
Need to get 1034 kB of archives.
After this operation, 3474 kB of additional disk space will be used.
Do you want to continue? [Y/n] y
Get:1 http://deb.debian.org/debian bullseye/main amd64 libncurses6 amd64 6.2+20201114-2 [102 kB]
Get:2 http://deb.debian.org/debian bullseye/main amd64 libncursesw6 amd64 6.2+20201114-2 [132 kB]
Get:3 http://deb.debian.org/debian bullseye/main amd64 libprocps8 amd64 2:3.3.17-5 [63.9 kB]
Get:4 http://deb.debian.org/debian bullseye/main amd64 procps amd64 2:3.3.17-5 [502 kB]
Get:5 http://deb.debian.org/debian bullseye/main amd64 libgpm2 amd64 1.20.7-8 [35.6 kB]                        
Get:6 http://deb.debian.org/debian bullseye/main amd64 psmisc amd64 23.4-2 [198 kB]                            
Fetched 1034 kB in 20s (52.5 kB/s)                                                                             
debconf: delaying package configuration, since apt-utils is not installed
Selecting previously unselected package libncurses6:amd64.
(Reading database ... 7815 files and directories currently installed.)
Preparing to unpack .../0-libncurses6_6.2+20201114-2_amd64.deb ...
Unpacking libncurses6:amd64 (6.2+20201114-2) ...
Selecting previously unselected package libncursesw6:amd64.
Preparing to unpack .../1-libncursesw6_6.2+20201114-2_amd64.deb ...
Unpacking libncursesw6:amd64 (6.2+20201114-2) ...
Selecting previously unselected package libprocps8:amd64.
Preparing to unpack .../2-libprocps8_2%3a3.3.17-5_amd64.deb ...
Unpacking libprocps8:amd64 (2:3.3.17-5) ...
Selecting previously unselected package procps.
Preparing to unpack .../3-procps_2%3a3.3.17-5_amd64.deb ...
Unpacking procps (2:3.3.17-5) ...
Selecting previously unselected package libgpm2:amd64.
Preparing to unpack .../4-libgpm2_1.20.7-8_amd64.deb ...
Unpacking libgpm2:amd64 (1.20.7-8) ...
Selecting previously unselected package psmisc.
Preparing to unpack .../5-psmisc_23.4-2_amd64.deb ...
Unpacking psmisc (23.4-2) ...
Setting up libgpm2:amd64 (1.20.7-8) ...
Setting up psmisc (23.4-2) ...
Setting up libncurses6:amd64 (6.2+20201114-2) ...
Setting up libncursesw6:amd64 (6.2+20201114-2) ...
Setting up libprocps8:amd64 (2:3.3.17-5) ...
Setting up procps (2:3.3.17-5) ...
Processing triggers for libc-bin (2.31-13+deb11u2) ...
root@tcp-probe:/# ps
  PID TTY          TIME CMD
   39 pts/0    00:00:00 bash
  385 pts/0    00:00:00 ps

备注：容器重启会重新拉取镜像，导致 apt-get 需要重新安装；

停掉 nginx

// 执行前
[root@k8s-master deployment]# kubectl get pod
NAME                      READY   STATUS             RESTARTS   AGE
shell-probe               0/1     CrashLoopBackOff   13         37m
tcp-probe                 1/1     Running            0          4m
user-v1-8cc9f4fb5-52hmd   1/1     Running            0          17h
user-v1-8cc9f4fb5-6l7mz   1/1     Running            0          17h
user-v1-8cc9f4fb5-zqj2l   1/1     Running            0          17h

// 执行
root@tcp-probe:/# pkill nginx
root@tcp-probe:/# command terminated with exit code 137

// 执行后
[root@k8s-master ~]# kubectl get pods
NAME                      READY   STATUS             RESTARTS   AGE
shell-probe               0/1     CrashLoopBackOff   15         47m
tcp-probe                 0/1     CrashLoopBackOff   1          7m
user-v1-8cc9f4fb5-52hmd   1/1     Running            0          17h
user-v1-8cc9f4fb5-6l7mz   1/1     Running            0          17h
user-v1-8cc9f4fb5-zqj2l   1/1     Running            0          17h

4）查看容器详情

[root@k8s-master ~]# kubectl describe pod tcp-probe

Events:
  Type     Reason     Age                  From               Message
  ----     ------     ----                 ----               -------
  Normal   Scheduled  35m                  default-scheduler  Successfully assigned default/tcp-probe to k8s-node
  Normal   Pulled     35m                  kubelet            Successfully pulled image "nginx" in 1.480773049s
  Normal   Pulled     30m                  kubelet            Successfully pulled image "nginx" in 15.410293911s
  Normal   Pulled     13m                  kubelet            Successfully pulled image "nginx" in 15.345629205s
  Warning  BackOff    5m27s                kubelet            Back-off restarting failed container
  Normal   Pulling    5m13s (x4 over 35m)  kubelet            Pulling image "nginx"
  Normal   Created    4m58s (x4 over 35m)  kubelet            Created container tcp-probe
  Normal   Started    4m58s (x4 over 35m)  kubelet            Started container tcp-probe
  Normal   Pulled     4m58s                kubelet            Successfully pulled image "nginx" in 15.360719175s

容器并没有重启，“Back-off restarting failed container”这句并不是重启，是初始化时的问题

方案二：进入容器，调整 nginx 的端口映射

重新启动tcp-probe容器

[root@k8s-master ~]# kubectl delete pod tcp-probe
pod "tcp-probe" deleted

[root@k8s-master ~]# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
user-v1-8cc9f4fb5-52hmd   1/1     Running   0          5d14h
user-v1-8cc9f4fb5-6l7mz   1/1     Running   0          5d14h
user-v1-8cc9f4fb5-zqj2l   1/1     Running   0          5d14h

[root@k8s-master deployment]# kubectl apply -f tcp-probe.yaml 
pod/tcp-probe created

[root@k8s-master deployment]# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
tcp-probe                 1/1     Running   0          12m
user-v1-8cc9f4fb5-52hmd   1/1     Running   0          5d14h
user-v1-8cc9f4fb5-6l7mz   1/1     Running   0          5d14h
user-v1-8cc9f4fb5-zqj2l   1/1     Running   0          5d14h

进入容器，将 nginx 端口改为 8080，使探针失败：

[root@k8s-master deployment]# kubectl exec -it tcp-probe  -- /bin/sh
# apt-get update
# apt-get install vim -y

修改 nginx 配置文件

vi  /etc/nginx/conf.d/default.conf

server {
    listen       9090;   // 80 改为 9090
    listen  [::]:80;
    server_name  localhost;

    #access_log  /var/log/nginx/host.access.log  main;

    location / {
        root   /usr/share/nginx/html;
        index  index.html index.htm;
    }

    #error_page  404              /404.html;

    # redirect server error pages to the static page /50x.html
    #
    error_page   500 502 503 504  /50x.html;
    location = /50x.html {
        root   /usr/share/nginx/html;
    }

    # proxy the PHP scripts to Apache listening on 127.0.0.1:80
    #                                              2,22          Top

重新加载配置文件

# nginx -s reload
2021/12/30 01:33:41 [notice] 425#425: signal process started
# exit

查看

[root@k8s-master deployment]# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
tcp-probe                 0/1     Running   0          22m
user-v1-8cc9f4fb5-52hmd   1/1     Running   0          5d15h
user-v1-8cc9f4fb5-6l7mz   1/1     Running   0          5d15h
user-v1-8cc9f4fb5-zqj2l   1/1     Running   0          5d15h

Events:
  Type     Reason     Age                From               Message
  ----     ------     ----               ----               -------
  Normal   Scheduled  22m                default-scheduler  Successfully assigned default/tcp-probe to k8s-node
  Normal   Pulling    22m                kubelet            Pulling image "nginx"
  Normal   Pulled     22m                kubelet            Successfully pulled image "nginx" in 19.889680609s
  Normal   Created    22m                kubelet            Created container tcp-probe
  Normal   Started    22m                kubelet            Started container tcp-probe
  Warning  Unhealthy  5s (x18 over 90s)  kubelet            Readiness probe failed: dial tcp 10.244.1.72:80: connect: connection refused

Readiness probe failed: dial tcp 10.244.1.72:80: connect: connection refused

可读性探针失败，连接 10.244.1.72:80 被拒绝

虽然探针失败了，但 pod 还是 Running 的状态

[root@k8s-master deployment]# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
tcp-probe                 0/1     Running   0          24m
user-v1-8cc9f4fb5-52hmd   1/1     Running   0          5d15h
user-v1-8cc9f4fb5-6l7mz   1/1     Running   0          5d15h
user-v1-8cc9f4fb5-zqj2l   1/1     Running   0          5d15h

所以，容器没有重启，只是不再被分配流量了

四、HTTPGetAction

1，介绍

Kubernetes 尝试使用 HTTP GET 请求去访问 Pod 内指定的 API 路径；

如果返回200，代表容器就是健康的
如果不能，代表这个 Pod 是有问题的

2，实现

1）创建配置文件

http-probe.yaml

vi http-probe.yaml

// 脚本内容
apiVersion: v1
kind: Pod
metadata:
  labels:
    test: http-probe
  name: http-probe
spec:
  containers:
  - name: http-probe
    image: http-probe:1.0.0
    livenessProbe: #存活探针
      httpGet:
        path: /liveness
        port: 3000
        httpHeaders:
        - name: source
          value: probe
      initialDelaySeconds: 5
      periodSeconds: 5

说明：

每隔 5 秒钟，请求 ip+3000+/liveness，传入请求头source=probe

2）应用配置

[root@k8s-master deployment]# kubectl apply -f http-probe.yaml 
pod/http-probe created

[root@k8s-master deployment]# kubectl get pods
NAME                      READY   STATUS              RESTARTS   AGE
http-probe                0/1     ContainerCreating   0          15s
tcp-probe                 0/1     Running             0          31m
user-v1-8cc9f4fb5-52hmd   1/1     Running             0          5d15h
user-v1-8cc9f4fb5-6l7mz   1/1     Running             0          5d15h
user-v1-8cc9f4fb5-zqj2l   1/1     Running             0          5d15h

http-probe 容器中，有一个 nodejs 服务：

访问路径 /liveness，获取头信息 source，10 秒之前是 200 之后就是 500

let http = require('http');
let start = Date.now();
http.createServer(function(req,res){
  if(req.url === '/liveness'){
    let value = req.headers['source'];
    if(value === 'probe'){
     let duration = Date.now()-start;
      if(duration>10*1000){
          res.statusCode=500;
          res.end('error');
      }else{
          res.statusCode=200;
          res.end('success');
      }
    }else{
     res.statusCode=200;
     res.end('liveness');
    }
  }else{
     res.statusCode=200;
     res.end('liveness');
  }
}).listen(3000, function(){console.log("http server started on 3000")});

Dockerfile

FROM node
COPY ./app /app
WORKDIR /app
EXPOSE 3000
CMD node index.js

如果配置文件有问题，删除 pod 重新启动：

[root@k8s-master deployment]# kubectl delete pod http-probe --force
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "http-probe" force deleted

[root@k8s-master deployment]# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
tcp-probe                 0/1     Running   0          38m
user-v1-8cc9f4fb5-52hmd   1/1     Running   0          5d15h
user-v1-8cc9f4fb5-6l7mz   1/1     Running   0          5d15h
user-v1-8cc9f4fb5-zqj2l   1/1     Running   0          5d15h

[root@k8s-master deployment]# kubectl apply -f http-probe.yaml 
pod/http-probe created

[root@k8s-master deployment]# kubectl get pods
NAME                      READY   STATUS    RESTARTS   AGE
http-probe                1/1     Running   0          11s
tcp-probe                 0/1     Running   0          38m
user-v1-8cc9f4fb5-52hmd   1/1     Running   0          5d15h
user-v1-8cc9f4fb5-6l7mz   1/1     Running   0          5d15h
user-v1-8cc9f4fb5-zqj2l   1/1     Running   0          5d15h

[root@k8s-master deployment]# kubectl describe pods http-probe

Events:
  Type     Reason     Age               From               Message
  ----     ------     ----              ----               -------
  Normal   Scheduled  24s               default-scheduler  Successfully assigned default/http-probe to k8s-node
  Normal   Pulled     24s               kubelet            Container image "http-probe:1.0.0" already present on machine
  Normal   Created    24s               kubelet            Created container http-probe
  Normal   Started    24s               kubelet            Started container http-probe
  Warning  Unhealthy  0s (x3 over 10s)  kubelet            Liveness probe failed: HTTP probe failed with statuscode: 500
  Normal   Killing    0s                kubelet            Container http-probe failed liveness probe, will be restarted

容器将会被重启

五，结尾

本篇，介绍了三种服务探针的原理和实现；

下一篇，构建私有镜像仓库;

K8S/Kubernetes

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐

【深度】阿里巴巴万级规模 K8s 集群全局高可用体系之美

作者 | 韩堂、柘远、沉醉来源 | 阿里巴巴云原生公众号前言台湾作家林清玄在接受记者采访的时候，如此评价自己 30 多年写作生涯：“第一个十年我才华横溢，‘贼光闪现’，令周边黯然失色；第二个十年，我终于‘宝光现形’，不再去抢风头，反而与身边的美丽相得益彰；进入第三个十年，繁华落尽见真醇，我进入了‘醇光初现’的阶段，真正体味到了境界之美”。长夜有穷，真水无香。领略过了 K8s“身在江

K8S/Kubernetes

如何基于 K8s 构建下一代 DevOps 平台？

作者 | 孙健波（天元）导读：当前云原生 DevOps 体系现状如何？面临哪些挑战？如何通过 OAM 解决云原生 DevOps 场景下的诸多问题？云原生开发应用模型 OAM(Open Application Model) 社区核心成员孙健波将为大家一一解答，并分享如何基于 OAM 和 Kubernetes 打造无限能力的下一代 DevOps 平台。什么是 DevOps？为什么基于 Kub