3节点测试集群 k8s 1.17 docker 19.03

每个节点2个网卡 :

  • enp0s3  用来桥接无线网卡用来连接外网,因此IP不固定。
  • enp0s8  192.168.56.0/24 用来和其它节点互通

某一天开机 突然发现大量pod异常,异常的Pod 全部没有正常获取到IP, service也都打不开了。

检查控制平面

除了 kube-system下的api-server, etcd-admin, scheduler, controller manager, 以及各个3个节点的kube-proxy 处于running状态。这说明集群健康状态是正常的,节点OS和系统资源也没问题,POD都被正常调度到node了。kube-proxy和calico-node都是ds, 使用Hostnetwork,因此IP就是所在节点IP。

Coredns 处于completed :

[root@admin ~ ]$k describe po  coredns-9d85f5447-sjs2j  -n kube-system

    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Thu, 15 Feb 2024 09:25:15 +0800
      Finished:     Thu, 15 Feb 2024 19:58:02 +0800
    Ready:          False
    Restart Count:  36
    Limits:
      memory:  170Mi
    Requests:
      cpu:        100m
      memory:     70Mi
    Liveness:     http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
    Readiness:    http-get http://:8181/ready delay=0s timeout=1s period=10s #success=1 #failure=3
    Environment:  <none>
    Mounts:
      /etc/coredns from config-volume (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from coredns-token-j84s8 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             False 
  ContainersReady   False 
  PodScheduled      True 
---
Events:
  Type    Reason          Age                    From            Message
  ----    ------          ----                   ----            -------
  Normal  SandboxChanged  4m3s (x575 over 129m)  kubelet, admin  Pod sandbox changed, it will be killed and re-created.

显示sandbox 退出Kill了,看日志

[root@admin ~ ]$k logs   coredns-9d85f5447-sjs2j  -n kube-system
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.5
linux/amd64, go1.13.4, c2fd1b2
[INFO] SIGTERM: Shutting down servers then terminating
[INFO] plugin/health: Going into lameduck mode for 5s

CNI组件calico-node 处于 Crashloopbackoff, 查看events 和Logs

Events:
  Type     Reason     Age                    From            Message
  ----     ------     ----                   ----            -------
  Warning  Unhealthy  11m (x304 over 176m)   kubelet, node2  Readiness probe failed: calico/node is not ready: felix is not ready: Get http://localhost:9099/readiness: dial tcp [::1]:9099: connect: connection refused
  Warning  BackOff    105s (x574 over 172m)  kubelet, node2  Back-off restarting failed container

[root@admin ~ ]$k logs  calico-node-7kvkf  -n kube-system
2024-02-16 04:58:08.483 [INFO][8] startup.go 259: Early log level set to info
2024-02-16 04:58:08.483 [INFO][8] startup.go 275: Using NODENAME environment for node name
2024-02-16 04:58:08.483 [INFO][8] startup.go 287: Determined node name: node2
2024-02-16 04:58:08.484 [INFO][8] k8s.go 228: Using Calico IPAM
2024-02-16 04:58:08.484 [INFO][8] startup.go 319: Checking datastore connection
2024-02-16 04:58:08.485 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable
2024-02-16 04:58:09.486 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable
2024-02-16 04:58:10.489 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable
2024-02-16 04:58:11.499 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable
2024-02-16 04:58:12.570 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable
2024-02-16 04:58:13.571 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable
2024-02-16 04:58:14.572 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable
2024-02-16 04:58:15.578 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable
2024-02-16 04:58:16.580 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable
2024-02-16 04:58:17.581 [INFO][8] startup.go 334: Hit error connecting to datastore - retry error=Get https://10.96.0.1:443/api/v1/nodes/foo: dial tcp 10.96.0.1:443: connect: network is unreachable

查看calico contorller 日志,没发现有用信息

[root@admin ~ ]$k logs  calico-kube-controllers-7489ff5b7c-6nl5p  -n kube-system
2024-02-15 01:25:31.218 [INFO][1] main.go 87: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
2024-02-15 01:25:31.222 [INFO][1] k8s.go 228: Using Calico IPAM
W0215 01:25:31.222664       1 client_config.go:541] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
2024-02-15 01:25:31.223 [INFO][1] main.go 108: Ensuring Calico datastore is initialized
2024-02-15 01:25:31.228 [INFO][1] main.go 182: Starting status report routine
2024-02-15 01:25:31.228 [INFO][1] main.go 364: Starting controller ControllerType="Node"
2024-02-15 01:25:31.228 [INFO][1] node_controller.go 133: Starting Node controller
2024-02-15 01:25:31.329 [INFO][1] node_controller.go 146: Node controller is now running
2024-02-15 01:25:31.345 [INFO][1] kdd.go 167: Node and IPAM data is in sync

calico-node 日志发现报错连接datastore tcp 10.96.0.1:443 失败, 那么这里是etcd吗 这个IP是谁呢

检查api-server 配置 发现这是svc ip range的第一个地址

[root@admin ~ ]$ps -ef|grep apiserver
root      1121 17490  0 13:34 pts/0    00:00:00 grep --color=auto apiserver
root      2939  2885  1 09:59 ?        00:04:00 kube-apiserver --advertise-address=192.168.56.3 --allow-privileged=true --authorization-mode=Node,RBAC --client-ca-file=/etc/kubernetes/pki/ca.crt --enable-admission-plugins=NodeRestriction --enable-bootstrap-token-auth=true --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key --etcd-servers=https://127.0.0.1:2379 --insecure-port=0 --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key --requestheader-allowed-names=front-proxy-client --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt --requestheader-extra-headers-prefix=X-Remote-Extra- --requestheader-group-headers=X-Remote-Group --requestheader-username-headers=X-Remote-User --secure-port=6443 --service-account-key-file=/etc/kubernetes/pki/sa.pub --service-cluster-ip-range=10.96.0.0/12 --tls-cert-file=/etc/kubernetes/pki/apiserver.crt --tls-private-key-file=/etc/kubernetes/pki/apiserver.key

 检查SVC, 发现他是k8s本身这个服务

[root@admin ~ ]$k describe svc  kubernetes  
Name:              kubernetes
Namespace:         default
Labels:            component=apiserver
                   provider=kubernetes
Annotations:       <none>
Selector:          <none>
Type:              ClusterIP
IP:                10.96.0.1
Port:              https  443/TCP
TargetPort:        6443/TCP
Endpoints:         192.168.56.3:6443
Session Affinity:  None
Events:            <none> 

它是一个clusterIP类型服务,指向的EP是 192.168.56.3:6443, 那么6443是谁暴露的呢

[root@admin ~ ]$lsof -i:6443
COMMAND    PID USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
kubelet   1355 root   25u  IPv4  37928      0t0  TCP admin:40182->admin:sun-sr-https (ESTABLISHED)
kube-apis 2939 root    5u  IPv6  37530      0t0  TCP *:sun-sr-https (LISTEN)

确定它就是k8s kube-apiserver-admin这个pod所单独暴露出来的svc, 是单例的pod,不属于任何rs/ds/deployment/sts。

pod正常,svc故障导致calico无法访问apiserver,也更无法为pod分配IP以及在每个节点配置iptables规则。那为什么网络不可达呢,这就要从路由层面查了。

一番排查有

因为今天admin节点连接了有线网络,没连WIFI,导致用来桥接的wifi网卡没有通过DHCP获取到IP、默认网关路由信息,所以admin节点就会出现10.96.0.1 网络不可达

[root@admin /etc/kubernetes ]$ansible k8s-1  -m shell  -a 'traceroute 10.96.0.1'
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
192.168.56.3 | FAILED | rc=1 >>
traceroute to 10.96.0.1 (10.96.0.1), 30 hops max, 60 byte packets
connect: Network is unreachablenon-zero return code
192.168.56.4 | CHANGED | rc=0 >>
traceroute to 10.96.0.1 (10.96.0.1), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * node1 (192.168.31.226)  3008.015 ms !H
192.168.56.5 | CHANGED | rc=0 >>
traceroute to 10.96.0.1 (10.96.0.1), 30 hops max, 60 byte packets
 1  * * *
 2  * * *
 3  * * *
 4  * * *
 5  * * *
 6  * * *
 7  * * *
 8  * * *
 9  * * *
10  * * *
11  * * node2 (192.168.31.20)  3005.846 ms !H

 [root@admin /etc/kubernetes ]$ansible k8s-1  -m shell  -a 'ip route'
[WARNING]: Invalid characters were found in group names but not replaced, use -vvvv to see details
192.168.56.4 | CHANGED | rc=0 >>
default via 192.168.31.1 dev enp0s3 proto dhcp metric 102
10.10.0.0/26 via 192.168.56.5 dev tunl0 proto bird onlink
10.10.0.128/26 via 192.168.56.3 dev tunl0 proto bird onlink
blackhole 10.10.0.192/26 proto bird
10.10.0.196 dev cali2473e8d3fe5 scope link
10.10.0.199 dev cali931cf856fe5 scope link
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.31.0/24 dev enp0s3 proto kernel scope link src 192.168.31.226 metric 102
192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.4 metric 101
192.168.56.3 | CHANGED | rc=0 >>  ## 默认路由缺失!!
10.10.0.0/26 via 192.168.56.5 dev tunl0 proto bird onlink
blackhole 10.10.0.128/26 proto bird
10.10.0.129 dev calib35f38918a6 scope link
10.10.0.130 dev cali3d6a8137e9b scope link
10.10.0.131 dev calief752050065 scope link
10.10.0.189 dev cali9cd0964c823 scope link
10.10.0.192/26 via 192.168.56.4 dev tunl0 proto bird onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.3 metric 101
192.168.56.5 | CHANGED | rc=0 >>
default via 192.168.31.1 dev enp0s3 proto dhcp metric 102
blackhole 10.10.0.0/26 proto bird
10.10.0.37 dev caliae17495c610 scope link
10.10.0.38 dev cali7c21225184f scope link
10.10.0.128/26 via 192.168.56.3 dev tunl0 proto bird onlink
10.10.0.192/26 via 192.168.56.4 dev tunl0 proto bird onlink
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.31.0/24 dev enp0s3 proto kernel scope link src 192.168.31.20 metric 102
192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.5 metric 101
[root@admin /etc/kubernetes ]$

此时去检查admin节点的路由表和iptables-nat表会发现,虽然针对这个svc的dnat规则已经配置,但是从网络流量的处理顺序来看,经过PREROUTING链后接着检查路由表,路由表匹配不到任何条目,也没有default gw的话,就直接unreachable了

Chain KUBE-SEP-G5V522HWZT6RKRAC (1 references)
 pkts bytes target     prot opt in     out     source               destination         
    0     0 KUBE-MARK-MASQ  all  --  *      *       192.168.56.3         0.0.0.0/0           
    7   420 DNAT       tcp  --  *      *       0.0.0.0/0            0.0.0.0/0            tcp to:192.168.56.3:6443

那么已知我的无线路由器地址为192.168.31.1, 它已经是node1, node2的默认网关,显然,无线路由器的路由表里肯定没有我K8s集群内部这个svc ip的相关条目并不知道向哪转发,ip数据包从admin host传输到10.96.0.1后面的podIP 到底跟网关地址有没有关系呢?

很显然  没有,只需要有一个路由条目能匹配就行,手动把默认网关配置到任何一个可达IP 都可以让数据继续往下流动,路由决策后从enp0s3网卡out, 继续经过FORWARD--->POSTROUTING链, 然后从enp0s8 in ,自下而上流经协议栈,以太网-->IP-->socket 进入apiserver这个Pod。

PS:这里因为用了hostnetwork, pod跟主机在一个netns下面。

实现10.96.0.1-->dnat to 192.168.56.3:6443 成功

 Pod 日志里也能看到,在我手动配置默认gw成功的瞬间,访问恢复正常。

此后,所有资源对k8s svc的访问正常,scheduler 和各个controller开始负责实现目标状态,集群恢复正常。

当然,这个问题可能只有用vm测试会遇到吧,生产不会有这个问题。

但是对于理解k8s CNI, SVC以及iptables 工作过程还有所帮助的,所以记录一下。

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐