背景

k8s集群一旦将svc中的externalIP设置成集群内任何一个节点IP,就会导致calico、kubelet、kube-proxy等组件无法与apiserver进行通信

环境

主机名IP
k8s-master-1(k8s-v1.20.10)192.168.0.10
k8s-node-1(k8s-v1.20.10)192.168.0.11
# Pod-IP
	10.0.0.0/16

# Service-IP
	10.70.0.0/16

现象

# 测试yaml
[root@k8s-master-1 externalip]# cat deployment.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox
spec:
  replicas: 1
  selector:
    matchLabels:
      app: httpd
  template:
    metadata:
      labels:
        app: httpd
    spec:
      containers:
      - name: busybox
        image: busybox:1.28
        imagePullPolicy: IfNotPresent
        command: ["/bin/sh","-c","echo -c 'this is httpd-v1 > /var/www/index.html';httpd -f -h /var/www"]
        ports:
        - containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
  name: busybox
spec:
  externalIPs:
  - 192.168.0.11
  type: ClusterIP
  ports:
  - port: 8888
    targetPort: 80
    protocol: TCP
  selector:
    app: httpd
# 查看集群状态
[root@k8s-master-1 yaml]# kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
kube-system   calico-kube-controllers-5855d94c7d-xz45j   1/1     Running   0          29s
kube-system   calico-node-4ftkk                          1/1     Running   0          28s
kube-system   calico-node-pcsw6                          1/1     Running   0          28s
kube-system   coredns-6f4c9cb7c5-2wsww                   1/1     Running   0          13s
# 部署externalip
[root@k8s-master-1 externalip]# kubectl apply -f deployment.yaml


# 过一会儿查看pod,calico-node无法ready了
[root@k8s-master-1 yaml]# kubectl get pods -A
NAMESPACE     NAME                                       READY   STATUS    RESTARTS   AGE
default       busybox-58984c55cc-44b6c                   0/1     Pending   0          3m21s
kube-system   calico-kube-controllers-5855d94c7d-xz45j   1/1     Running   0          5m13s
kube-system   calico-node-4ftkk                          0/1     Running   0          5m12s
kube-system   calico-node-pcsw6                          1/1     Running   0          5m12s
kube-system   coredns-6f4c9cb7c5-2wsww                   1/1     Running   0          4m57s



# 查看k8s-node-1日志
==> kube-proxy.INFO <==
I0420 16:32:40.293230    4655 service.go:275] Service default/busybox updated: 1 ports
I0420 16:32:40.293615    4655 service.go:390] Adding new service port "default/busybox" at 10.0.237.220:8888/TCP
I0420 16:32:40.367114    4655 proxier.go:2243] Opened local port "externalIP for default/busybox" (192.168.0.11:8888/tcp)

==> kube-proxy.k8s-node-1.root.log.INFO.20220420-161329.4655 <==
I0420 16:32:40.293230    4655 service.go:275] Service default/busybox updated: 1 ports
I0420 16:32:40.293615    4655 service.go:390] Adding new service port "default/busybox" at 10.0.237.220:8888/TCP
I0420 16:32:40.367114    4655 proxier.go:2243] Opened local port "externalIP for default/busybox" (192.168.0.11:8888/tcp)

==> kubelet.ERROR <==
E0420 16:32:57.962067    4333 controller.go:187] failed to update lease, error: Put "https://192.168.0.10:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k8s-node-1?timeout=10s": context deadline exceeded
E0420 16:32:59.810776    4333 kubelet_node_status.go:470] Error updating node status, will retry: error getting node "k8s-node-1": Get "https://192.168.0.10:6443/api/v1/nodes/k8s-node-1?resourceVersion=0&timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)


, ReportingInstance:""}': 'Post "https://192.168.0.10:6443/api/v1/namespaces/default/events": dial tcp 192.168.0.10:6443: connect: connection refused'(may retry after sleeping)
E0420 13:09:58.810236    6420 kubelet.go:2263] node "k8s-node-1" not found
E0420 13:13:50.096947    8005 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://192.168.0.10:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.0.10:6443: connect: connection refused
E0420 13:14:47.641827    8005 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.0.10:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k8s-node-1?timeout=10s": dial tcp 192.168.0.10:6443: connect: connection refused

# 测试apiserver端口,端口也不通了
[root@k8s-node-1 kubernetes]# telnet 192.168.0.10 6443
Trying 192.168.0.10...
^C

分析

service信息
# 查看SVC(将IP修改成不是二节点的IP)
[root@k8s-master-1 ~]# kubectl get svc
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP    PORT(S)    AGE
busybox      ClusterIP   10.0.238.86   192.168.0.15   8888/TCP   4h16m
kubernetes   ClusterIP   10.0.0.1      <none>         443/TCP    186d

# 查看POD
[root@k8s-master-1 ~]# kubectl get pods -A -o wide
NAMESPACE     NAME                                       READY   IP             NODE       
default       busybox-58984c55cc-2jgmv                   1/1     10.70.2.65     k8s-master-1  
kube-system   calico-kube-controllers-5855d94c7d-lzskg   1/1     192.168.0.10   k8s-master-1  
kube-system   calico-node-djj49                          1/1     192.168.0.11   k8s-node-1     
kube-system   calico-node-hr9vf                          1/1     192.168.0.10   k8s-master-1   
kube-system   coredns-6f4c9cb7c5-vrbgw                   1/1     10.70.2.71     k8s-master-1
网卡信息
# k8s-master-1 网卡信息
[root@k8s-master-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:34:ce:c5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.10/24 brd 192.168.0.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::624c:c1db:e3b4:9165/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.70.2.64/32 scope global tunl0
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:2c:39:4d:d5 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether c6:d8:18:d4:90:5a brd ff:ff:ff:ff:ff:ff
6: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether 4a:f3:f8:f2:a6:aa brd ff:ff:ff:ff:ff:ff # kube-ipvs0这张网卡上的IP为集群SVC的IP,且每个节点都有
    inet 10.0.238.86/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 192.168.0.15/32 scope global kube-ipvs0   # 这个IP就是external的,如果把这个IP设置成二个节点之一,会导致IPVS转发出问题
       valid_lft forever preferred_lft forever
    inet 10.0.0.1/32 scope global kube-ipvs0       # 这个IP会将流量转发到apiserver
       valid_lft forever preferred_lft forever
    inet 10.0.0.10/32 scope global kube-ipvs0	   # DNS
       valid_lft forever preferred_lft forever
9: calia2fcccbef15@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
10: califb8bd460169@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
    inet6 fe80::ecee:eeff:feee:eeee/64 scope link 
       valid_lft forever preferred_lft forever
 
# k8s-node-1 网卡信息
[root@k8s-node-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:25:c5:0b brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.11/24 brd 192.168.0.255 scope global ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:fe25:c50b/64 scope link 
       valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
    link/ipip 0.0.0.0 brd 0.0.0.0
    inet 10.70.2.0/32 scope global tunl0
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default 
    link/ether 02:42:fd:f2:6b:91 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever
5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 7e:07:bb:db:5f:a6 brd ff:ff:ff:ff:ff:ff
6: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default 
    link/ether 5e:ea:10:20:21:f9 brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.10/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.0.238.86/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 192.168.0.15/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
    inet 10.0.0.1/32 scope global kube-ipvs0
       valid_lft forever preferred_lft forever
ipvs信息
# 查看k8s-master-1-ipvs
[root@k8s-master-1 ~]# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.15:8888 rr		# 将192.168.0.15:8888 -> 10.70.2.65(pod-ip)
  -> 10.70.2.65:80                Masq    1      0          0         
TCP  10.0.0.1:443 rr
  -> 192.168.0.10:6443            Masq    1      2          0         
TCP  10.0.0.10:53 rr
  -> 10.70.2.71:53                Masq    1      0          0         
TCP  10.0.0.10:9153 rr
  -> 10.70.2.71:9153              Masq    1      0          0         
TCP  10.0.238.86:8888 rr
  -> 10.70.2.65:80                Masq    1      0          0         
UDP  10.0.0.10:53 rr
  -> 10.70.2.71:53                Masq    1      0          0 
  
# 查看k8s-node-1-ipvs
[root@k8s-node-1 ~]# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.15:8888 rr
  -> 10.70.2.65:80                Masq    1      0          0         
TCP  10.0.0.1:443 rr
  -> 192.168.0.10:6443            Masq    1      0          0         
TCP  10.0.0.10:53 rr
  -> 10.70.2.71:53                Masq    1      0          0         
TCP  10.0.0.10:9153 rr
  -> 10.70.2.71:9153              Masq    1      0          0         
TCP  10.0.238.86:8888 rr
  -> 10.70.2.65:80                Masq    1      0          0         
UDP  10.0.0.10:53 rr
  -> 10.70.2.71:53                Masq    1      0          0  
kube-ipvs0复现
# 查看网卡信息
[root@boy ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:ec:1c:2d brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.10/24 brd 192.168.0.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::624c:c1db:e3b4:9165/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: ens36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:ec:1c:37 brd ff:ff:ff:ff:ff:ff
    inet 10.70.2.199/24 brd 10.70.2.255 scope global noprefixroute ens36
       valid_lft forever preferred_lft forever
    inet6 fe80::20c:29ff:feec:1c37/64 scope link 
       valid_lft forever preferred_lft forever

# 将ens36 down了,然后配置一个IP
[root@boy ~]# ip link set ens36 down
[root@boy ~]# ip addr add 192.168.0.11/32 dev ens36

# 查看网络信息
[root@boy ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 00:0c:29:ec:1c:2d brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.10/24 brd 192.168.0.255 scope global noprefixroute ens33
       valid_lft forever preferred_lft forever
    inet6 fe80::624c:c1db:e3b4:9165/64 scope link noprefixroute 
       valid_lft forever preferred_lft forever
3: ens36: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
    link/ether 00:0c:29:ec:1c:37 brd ff:ff:ff:ff:ff:ff
    inet 192.168.0.11/32 scope global ens36
       valid_lft forever preferred_lft forever


# 对192.168.0.11进行网络测试(此时ens36功能和kube-ipvs0差不多)
[root@boy ~]# ping 192.168.0.11
PING 192.168.0.11 (192.168.0.11) 56(84) bytes of data.
64 bytes from 192.168.0.11: icmp_seq=1 ttl=64 time=0.039 ms
64 bytes from 192.168.0.11: icmp_seq=2 ttl=64 time=0.069 ms

# 对回环口进行抓包,可见因为192.168.0.11(即使这张网卡down了)为本机IP,所以去192.168.0.11的流量全部会进入回环口
[root@boy ~]# tcpdump -i lo icmp -Nnvv
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
11:59:59.871594 IP (tos 0x0, ttl 64, id 37520, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.0.11 > 192.168.0.11: ICMP echo request, id 1752, seq 8, length 64
11:59:59.871622 IP (tos 0x0, ttl 64, id 37521, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.0.11 > 192.168.0.11: ICMP echo reply, id 1752, seq 8, length 64
12:00:00.871450 IP (tos 0x0, ttl 64, id 37555, offset 0, flags [DF], proto ICMP (1), length 84)
    192.168.0.11 > 192.168.0.11: ICMP echo request, id 1752, seq 9, length 64
12:00:00.871478 IP (tos 0x0, ttl 64, id 37556, offset 0, flags [none], proto ICMP (1), length 84)
    192.168.0.11 > 192.168.0.11: ICMP echo reply, id 1752, seq 9, length 64

根本原因

# 将externalip 修该为MASTER-IP
[root@k8s-master-1 ~]# kubectl get svc
NAME         TYPE        CLUSTER-IP    EXTERNAL-IP    PORT(S)    AGE
busybox      ClusterIP   10.0.238.86   192.168.0.10   8888/TCP   5h30m
kubernetes   ClusterIP   10.0.0.1      <none>         443/TCP    186d


# k8s-master-1 ping k8s-node-1
kubernetes   ClusterIP   10.0.0.1      <none>         443/TCP    186d
[root@k8s-master-1 ~]# ping 192.168.0.11
PING 192.168.0.11 (192.168.0.11) 56(84) bytes of data.
From 192.168.0.10 icmp_seq=1 Destination Host Unreachable
From 192.168.0.10 icmp_seq=2 Destination Host Unreachable
From 192.168.0.10 icmp_seq=3 Destination Host Unreachable
From 192.168.0.10 icmp_seq=4 Destination Host Unreachable
From 192.168.0.10 icmp_seq=5 Destination Host Unreachable
From 192.168.0.10 icmp_seq=6 Destination Host Unreachable


# k8s-master-1抓包
[root@k8s-master-1 ~]# tcpdump -i any arp -Nvvn
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
23:23:59.687475 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:00.711622 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:01.736505 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:02.758823 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:03.783078 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:04.806981 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:05.831077 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:06.855043 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:07.878912 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:08.903272 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28


# k8s-node-1抓包
[root@k8s-node-1 ~]# tcpdump -i any arp -Nvn
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
23:24:02.732899 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:03.756971 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:04.780764 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:05.804609 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:06.828534 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:07.852242 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
  • 根据上面的抓包可知,k8s-master-1将ARP-Request数据包发送给k8s-node-1,k8s-node-1的确也收到这个ARP包了,但是由于extertalip这个IP,k8s-node-1节点也拥有,所以会将k8s-master-1的ARP Request发送给lo,lo然后把数据包发送给协议栈,协议栈发送给应用程序(实际上k8s-node-1并没有程序需要这个包,这也就导致了k8s-master-1没有收到ARP响应包(这个响应包被k8s-node-1本身接收了),导致k8s-master-1无法获取k8s-node-1的MAC地址,从而导致了集群异常)
  • k8s的每一个SVC IP都会在集群内部每个节点的kube-ipvs0网卡下生成IP,每个节点对这个IP进行ARP、ping等的流量都会被发送到本机的lo口只有对这个IP特定端口,会被ipvs转发到后端的Pod某个端口
  • k8s-node-1组件异常:因为192.168.0.10在k8s-node-1也有,当k8s-node-1的kublet和kube-proxy与k8s-master-1的apiserver通信时,会把流量转发到本机6443端口,但是ipvs没做这个转发,故而k8s-node-1组件无法与k8s-master-1通信了
Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐