k8s IPVS模式下externalIP导致节点故障浅析
k8s集群一旦将svc中的externalIP设置成集群内任何一个节点IP,就会导致calico、kubelet、kube-proxy等组件无法与apiserver进行通信
·
背景
k8s集群一旦将svc中的externalIP设置成集群内任何一个节点IP,就会导致calico、kubelet、kube-proxy等组件无法与apiserver进行通信
环境
主机名 | IP |
---|---|
k8s-master-1(k8s-v1.20.10) | 192.168.0.10 |
k8s-node-1(k8s-v1.20.10) | 192.168.0.11 |
# Pod-IP
10.0.0.0/16
# Service-IP
10.70.0.0/16
现象
# 测试yaml
[root@k8s-master-1 externalip]# cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox
spec:
replicas: 1
selector:
matchLabels:
app: httpd
template:
metadata:
labels:
app: httpd
spec:
containers:
- name: busybox
image: busybox:1.28
imagePullPolicy: IfNotPresent
command: ["/bin/sh","-c","echo -c 'this is httpd-v1 > /var/www/index.html';httpd -f -h /var/www"]
ports:
- containerPort: 80
---
apiVersion: v1
kind: Service
metadata:
name: busybox
spec:
externalIPs:
- 192.168.0.11
type: ClusterIP
ports:
- port: 8888
targetPort: 80
protocol: TCP
selector:
app: httpd
# 查看集群状态
[root@k8s-master-1 yaml]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-5855d94c7d-xz45j 1/1 Running 0 29s
kube-system calico-node-4ftkk 1/1 Running 0 28s
kube-system calico-node-pcsw6 1/1 Running 0 28s
kube-system coredns-6f4c9cb7c5-2wsww 1/1 Running 0 13s
# 部署externalip
[root@k8s-master-1 externalip]# kubectl apply -f deployment.yaml
# 过一会儿查看pod,calico-node无法ready了
[root@k8s-master-1 yaml]# kubectl get pods -A
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox-58984c55cc-44b6c 0/1 Pending 0 3m21s
kube-system calico-kube-controllers-5855d94c7d-xz45j 1/1 Running 0 5m13s
kube-system calico-node-4ftkk 0/1 Running 0 5m12s
kube-system calico-node-pcsw6 1/1 Running 0 5m12s
kube-system coredns-6f4c9cb7c5-2wsww 1/1 Running 0 4m57s
# 查看k8s-node-1日志
==> kube-proxy.INFO <==
I0420 16:32:40.293230 4655 service.go:275] Service default/busybox updated: 1 ports
I0420 16:32:40.293615 4655 service.go:390] Adding new service port "default/busybox" at 10.0.237.220:8888/TCP
I0420 16:32:40.367114 4655 proxier.go:2243] Opened local port "externalIP for default/busybox" (192.168.0.11:8888/tcp)
==> kube-proxy.k8s-node-1.root.log.INFO.20220420-161329.4655 <==
I0420 16:32:40.293230 4655 service.go:275] Service default/busybox updated: 1 ports
I0420 16:32:40.293615 4655 service.go:390] Adding new service port "default/busybox" at 10.0.237.220:8888/TCP
I0420 16:32:40.367114 4655 proxier.go:2243] Opened local port "externalIP for default/busybox" (192.168.0.11:8888/tcp)
==> kubelet.ERROR <==
E0420 16:32:57.962067 4333 controller.go:187] failed to update lease, error: Put "https://192.168.0.10:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k8s-node-1?timeout=10s": context deadline exceeded
E0420 16:32:59.810776 4333 kubelet_node_status.go:470] Error updating node status, will retry: error getting node "k8s-node-1": Get "https://192.168.0.10:6443/api/v1/nodes/k8s-node-1?resourceVersion=0&timeout=10s": net/http: request canceled (Client.Timeout exceeded while awaiting headers)
, ReportingInstance:""}': 'Post "https://192.168.0.10:6443/api/v1/namespaces/default/events": dial tcp 192.168.0.10:6443: connect: connection refused'(may retry after sleeping)
E0420 13:09:58.810236 6420 kubelet.go:2263] node "k8s-node-1" not found
E0420 13:13:50.096947 8005 reflector.go:138] k8s.io/client-go/informers/factory.go:134: Failed to watch *v1.CSIDriver: failed to list *v1.CSIDriver: Get "https://192.168.0.10:6443/apis/storage.k8s.io/v1/csidrivers?limit=500&resourceVersion=0": dial tcp 192.168.0.10:6443: connect: connection refused
E0420 13:14:47.641827 8005 controller.go:144] failed to ensure lease exists, will retry in 7s, error: Get "https://192.168.0.10:6443/apis/coordination.k8s.io/v1/namespaces/kube-node-lease/leases/k8s-node-1?timeout=10s": dial tcp 192.168.0.10:6443: connect: connection refused
# 测试apiserver端口,端口也不通了
[root@k8s-node-1 kubernetes]# telnet 192.168.0.10 6443
Trying 192.168.0.10...
^C
分析
service信息
# 查看SVC(将IP修改成不是二节点的IP)
[root@k8s-master-1 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
busybox ClusterIP 10.0.238.86 192.168.0.15 8888/TCP 4h16m
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 186d
# 查看POD
[root@k8s-master-1 ~]# kubectl get pods -A -o wide
NAMESPACE NAME READY IP NODE
default busybox-58984c55cc-2jgmv 1/1 10.70.2.65 k8s-master-1
kube-system calico-kube-controllers-5855d94c7d-lzskg 1/1 192.168.0.10 k8s-master-1
kube-system calico-node-djj49 1/1 192.168.0.11 k8s-node-1
kube-system calico-node-hr9vf 1/1 192.168.0.10 k8s-master-1
kube-system coredns-6f4c9cb7c5-vrbgw 1/1 10.70.2.71 k8s-master-1
网卡信息
# k8s-master-1 网卡信息
[root@k8s-master-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:34:ce:c5 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.10/24 brd 192.168.0.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::624c:c1db:e3b4:9165/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.70.2.64/32 scope global tunl0
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:2c:39:4d:d5 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether c6:d8:18:d4:90:5a brd ff:ff:ff:ff:ff:ff
6: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 4a:f3:f8:f2:a6:aa brd ff:ff:ff:ff:ff:ff # kube-ipvs0这张网卡上的IP为集群SVC的IP,且每个节点都有
inet 10.0.238.86/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 192.168.0.15/32 scope global kube-ipvs0 # 这个IP就是external的,如果把这个IP设置成二个节点之一,会导致IPVS转发出问题
valid_lft forever preferred_lft forever
inet 10.0.0.1/32 scope global kube-ipvs0 # 这个IP会将流量转发到apiserver
valid_lft forever preferred_lft forever
inet 10.0.0.10/32 scope global kube-ipvs0 # DNS
valid_lft forever preferred_lft forever
9: calia2fcccbef15@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
10: califb8bd460169@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
# k8s-node-1 网卡信息
[root@k8s-node-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:25:c5:0b brd ff:ff:ff:ff:ff:ff
inet 192.168.0.11/24 brd 192.168.0.255 scope global ens33
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe25:c50b/64 scope link
valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 10.70.2.0/32 scope global tunl0
valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:fd:f2:6b:91 brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
5: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 7e:07:bb:db:5f:a6 brd ff:ff:ff:ff:ff:ff
6: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 5e:ea:10:20:21:f9 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.10/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.0.238.86/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 192.168.0.15/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.0.0.1/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
ipvs信息
# 查看k8s-master-1-ipvs
[root@k8s-master-1 ~]# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.0.15:8888 rr # 将192.168.0.15:8888 -> 10.70.2.65(pod-ip)
-> 10.70.2.65:80 Masq 1 0 0
TCP 10.0.0.1:443 rr
-> 192.168.0.10:6443 Masq 1 2 0
TCP 10.0.0.10:53 rr
-> 10.70.2.71:53 Masq 1 0 0
TCP 10.0.0.10:9153 rr
-> 10.70.2.71:9153 Masq 1 0 0
TCP 10.0.238.86:8888 rr
-> 10.70.2.65:80 Masq 1 0 0
UDP 10.0.0.10:53 rr
-> 10.70.2.71:53 Masq 1 0 0
# 查看k8s-node-1-ipvs
[root@k8s-node-1 ~]# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 192.168.0.15:8888 rr
-> 10.70.2.65:80 Masq 1 0 0
TCP 10.0.0.1:443 rr
-> 192.168.0.10:6443 Masq 1 0 0
TCP 10.0.0.10:53 rr
-> 10.70.2.71:53 Masq 1 0 0
TCP 10.0.0.10:9153 rr
-> 10.70.2.71:9153 Masq 1 0 0
TCP 10.0.238.86:8888 rr
-> 10.70.2.65:80 Masq 1 0 0
UDP 10.0.0.10:53 rr
-> 10.70.2.71:53 Masq 1 0 0
kube-ipvs0复现
# 查看网卡信息
[root@boy ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:ec:1c:2d brd ff:ff:ff:ff:ff:ff
inet 192.168.0.10/24 brd 192.168.0.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::624c:c1db:e3b4:9165/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens36: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:ec:1c:37 brd ff:ff:ff:ff:ff:ff
inet 10.70.2.199/24 brd 10.70.2.255 scope global noprefixroute ens36
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:feec:1c37/64 scope link
valid_lft forever preferred_lft forever
# 将ens36 down了,然后配置一个IP
[root@boy ~]# ip link set ens36 down
[root@boy ~]# ip addr add 192.168.0.11/32 dev ens36
# 查看网络信息
[root@boy ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens33: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:ec:1c:2d brd ff:ff:ff:ff:ff:ff
inet 192.168.0.10/24 brd 192.168.0.255 scope global noprefixroute ens33
valid_lft forever preferred_lft forever
inet6 fe80::624c:c1db:e3b4:9165/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: ens36: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast state DOWN group default qlen 1000
link/ether 00:0c:29:ec:1c:37 brd ff:ff:ff:ff:ff:ff
inet 192.168.0.11/32 scope global ens36
valid_lft forever preferred_lft forever
# 对192.168.0.11进行网络测试(此时ens36功能和kube-ipvs0差不多)
[root@boy ~]# ping 192.168.0.11
PING 192.168.0.11 (192.168.0.11) 56(84) bytes of data.
64 bytes from 192.168.0.11: icmp_seq=1 ttl=64 time=0.039 ms
64 bytes from 192.168.0.11: icmp_seq=2 ttl=64 time=0.069 ms
# 对回环口进行抓包,可见因为192.168.0.11(即使这张网卡down了)为本机IP,所以去192.168.0.11的流量全部会进入回环口
[root@boy ~]# tcpdump -i lo icmp -Nnvv
tcpdump: listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes
11:59:59.871594 IP (tos 0x0, ttl 64, id 37520, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.0.11 > 192.168.0.11: ICMP echo request, id 1752, seq 8, length 64
11:59:59.871622 IP (tos 0x0, ttl 64, id 37521, offset 0, flags [none], proto ICMP (1), length 84)
192.168.0.11 > 192.168.0.11: ICMP echo reply, id 1752, seq 8, length 64
12:00:00.871450 IP (tos 0x0, ttl 64, id 37555, offset 0, flags [DF], proto ICMP (1), length 84)
192.168.0.11 > 192.168.0.11: ICMP echo request, id 1752, seq 9, length 64
12:00:00.871478 IP (tos 0x0, ttl 64, id 37556, offset 0, flags [none], proto ICMP (1), length 84)
192.168.0.11 > 192.168.0.11: ICMP echo reply, id 1752, seq 9, length 64
根本原因
# 将externalip 修该为MASTER-IP
[root@k8s-master-1 ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
busybox ClusterIP 10.0.238.86 192.168.0.10 8888/TCP 5h30m
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 186d
# k8s-master-1 ping k8s-node-1
kubernetes ClusterIP 10.0.0.1 <none> 443/TCP 186d
[root@k8s-master-1 ~]# ping 192.168.0.11
PING 192.168.0.11 (192.168.0.11) 56(84) bytes of data.
From 192.168.0.10 icmp_seq=1 Destination Host Unreachable
From 192.168.0.10 icmp_seq=2 Destination Host Unreachable
From 192.168.0.10 icmp_seq=3 Destination Host Unreachable
From 192.168.0.10 icmp_seq=4 Destination Host Unreachable
From 192.168.0.10 icmp_seq=5 Destination Host Unreachable
From 192.168.0.10 icmp_seq=6 Destination Host Unreachable
# k8s-master-1抓包
[root@k8s-master-1 ~]# tcpdump -i any arp -Nvvn
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
23:23:59.687475 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:00.711622 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:01.736505 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:02.758823 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:03.783078 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:04.806981 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:05.831077 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:06.855043 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:07.878912 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
23:24:08.903272 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 28
# k8s-node-1抓包
[root@k8s-node-1 ~]# tcpdump -i any arp -Nvn
tcpdump: listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
23:24:02.732899 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:03.756971 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:04.780764 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:05.804609 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:06.828534 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
23:24:07.852242 ARP, Ethernet (len 6), IPv4 (len 4), Request who-has 192.168.0.11 tell 192.168.0.10, length 46
根据上面的抓包可知,k8s-master-1将ARP-Request数据包发送给k8s-node-1,k8s-node-1的确也收到这个ARP包了,但是由于extertalip这个IP,k8s-node-1节点也拥有,所以会将k8s-master-1的ARP Request发送给lo,lo然后把数据包发送给协议栈,协议栈发送给应用程序(实际上k8s-node-1并没有程序需要这个包,这也就导致了k8s-master-1没有收到ARP响应包(这个响应包被k8s-node-1本身接收了),导致k8s-master-1无法获取k8s-node-1的MAC地址,从而导致了集群异常)
- k8s的
每一个SVC IP都会在集群内部每个节点的kube-ipvs0网卡下生成IP
,每个节点对这个IP进行ARP、ping等的流量都会被发送到本机的lo口
,只有对这个IP特定端口,会被ipvs转发到后端的Pod某个端口
- k8s-node-1组件异常:
因为192.168.0.10在k8s-node-1也有,当k8s-node-1的kublet和kube-proxy与k8s-master-1的apiserver通信时,会把流量转发到本机6443端口,但是ipvs没做这个转发,故而k8s-node-1组件无法与k8s-master-1通信了
更多推荐
已为社区贡献43条内容
所有评论(0)