K8S Calico IP In IP网络模式通信分析
代理 ARP 是 ARP 协议的一个变种,当 ARP 请求目标跨网段时,网关设备收到此 ARP 请求,会用自己的 MAC 地址返回给请求者,这便是代理 ARP(Proxy ARP)。下面这张图中,电脑发送 ARP 请求服务器8.8.8.8 的 MAC 地址,路由器(网关)收到这个请求时会进行判断,由于目标 8.8.8.8 不属于本网段(即跨网段),此时便返回自己的接口 MAC 地址给 PC,后续电
Calico IP-In-IP通信分析
IP In IP网络模型
IP In IP开启方式
# 开启IP In IP 模式方式:设置环境变量CALICO_IPV4POOL_IPIP来标识是否开启IPinIP Mode. 如果该变量的值为Always那么就是开启IPIP,如果关闭需要设置为Never
- name: CALICO_IPV4POOL_IPIP
value: "Always"
测试容器YAML
主机 | IP |
---|---|
k8s-master-1 | 192.168.0.11/24 |
K8s-node-1 | 192.168.0.12/24 |
apiVersion: v1
kind: Service
metadata:
name: busybox
namespace: devops
spec:
selector:
app: busybox
type: NodePort
ports:
- name: http
port: 8888
protocol: TCP
targetPort: 80
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: busybox
namespace: devops
spec:
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
app: busybox
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
name: busybox
labels:
app: busybox
spec:
affinity: # 防止二个busybox 在同一个节点
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- topologyKey: kubernetes.io/hostname
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- busybox
restartPolicy: Always
containers:
- command: ["/bin/sh","-c","mkdir -p /var/lib/www && httpd -f -v -p 80 -h /var/lib/www"]
name: busybox
image: docker.io/library/busybox:latest
imagePullPolicy: IfNotPresent
ports:
- name: http
containerPort: 80
IP In IP通信分析(跨主机)
- 查看pod信息
╰─ kubectl get pods -n devops -o custom-columns=NAME:.metadata.name,IP:.status.podIP,HOST:.spec.nodeName
NAME IP HOST
busybox-77649b9c55-7d27b 172.16.109.65 k8s-node-1
busybox-77649b9c55-r6bx9 172.16.196.1 k8s-master-1
- 进入k8s-master-1的容器busybox-77649b9c55-r6bx9查看路由信息
/ # route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 169.254.1.1 0.0.0.0 UG 0 0 0 eth0
169.254.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
# 查看容器网络
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue # 容器的veth设备对端设备在宿主机的序号为7
link/ether 86:9c:03:9e:db:9f brd ff:ff:ff:ff:ff:ff
inet 172.16.196.1/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::849c:3ff:fe9e:db9f/64 scope link
valid_lft forever preferred_lft forever
从上述中我们可以看出,k8s-master-1的容器busybox-77649b9c55-r6bx9默认有一个网关:169.254.1.1
。但是整个网络中没有一张网卡是这个地址
-
从路由表可以知道
169.254.1.1
是容器的默认网关,但却找不到任何一张网卡对应这个 IP 地址。当一个数据包的目的地址不是本机时,就会查询路由表,从路由表中查到网关后,它首先会通过 ARP广播获得网关的 MAC 地址,然后在发出的网络数据包中将目标 MAC 改为网关的 MAC
,而网关的 IP 地址不会出现在任何网络包头中。也就是说,没有人在乎这个 IP 地址究竟是什么,只要能找到对应的 MAC 地址,能响应 ARP 就行了 -
在Kubernetes Calico网络中,
当一个数据包的目的地址不是本网络时,会先发起ARP广播,网关即169.254.1.1收到会将自己的mac地址返回给发送端
,
后续的请求由这个veth对进行完成,使用代理arp做了arp欺骗。这样做抑制了arp广播攻击,并且通过代理arp也可以进行跨网络的访问 -
查看MAC地址信息,这个 MAC 地址应该是 Calico 硬塞进去的,而且还能响应 ARP。正常情况下,内核会对外发送 ARP 请求,询问整个二层网络中谁拥有
169.254.1.1
这个 IP 地址,拥有这个 IP 地址的设备会将自己的 MAC地址返回给对方。但现在的情况比较尴尬,容器和主机都没有这个 IP 地址,甚至连主机上的网卡:calixxxxx
,。MAC 地址也是一个无用的ee:ee:ee:ee:ee:ee
-
实际上 Calico 利用了网卡的代理 ARP 功能。代理 ARP 是 ARP 协议的一个变种,当 ARP 请求目标跨网段时,网关设备收到此 ARP 请求,会用自己的 MAC 地址返回给请求者,这便是代理 ARP(Proxy ARP)。下面这张图中,电脑发送 ARP 请求服务器8.8.8.8 的 MAC 地址,路由器(网关)收到这个请求时会进行判断,由于目标 8.8.8.8 不属于本网段(即跨网段),此时便返回自己的接口 MAC 地址给 PC,后续电脑访问服务器时,目标 MAC 直接封装为 MAC25
- k8s-master-1宿主机节点网卡信息
/ # ip neigh
169.254.1.1 dev eth0 lladdr ee:ee:ee:ee:ee:ee used 0/0/0 probes 1 STALE
# k8s-master-1 自身网卡信息查看
[root@k8s-master-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:b1:02:f0 brd ff:ff:ff:ff:ff:ff
altname enp2s0
inet 192.168.0.11/24 brd 192.168.0.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:feb1:2f0/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 172.16.196.0/32 scope global tunl0
valid_lft forever preferred_lft forever
4: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether fa:89:72:a0:30:aa brd ff:ff:ff:ff:ff:ff
inet 10.96.0.1/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.192.105/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.0.10/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
7: cali12242800409@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default # 容器busybox的对端设备
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-d0ea7fe4-4514-1aed-cfd6-fcaf904b837a
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
8: calif57336c1ec9@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default # 其他容器的(coredns)
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-dd44e8c7-309b-f336-2e19-5c0f0b0f83ab
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
# 查看calied926cbf5c7@if4网卡的ARP代理参数
[root@k8s-master-1 ~]# cat /proc/sys/net/ipv4/conf/calif57336c1ec9/proxy_arp
1
- 通过veth-pair会传递到对端calixxx上,因为calixxx网卡开启了arp proxy,
所以它会代答所有的ARP请求,让容器的报文都发到calixxx上,也就是发送到主机网络栈,再使用主机网络栈的路由来送到下一站
. 可以通过cat /proc/sys/net/ipv4/conf/calixxx/proxy_arp/来查看,输出都是1 - Calico 通过一个巧妙的方法将 workload 的所有流量引导到一个特殊的网关 169.254.1.1,从而引流到主机的 calixxx 网络设备上,
最终将二三层流量全部转换成三层流量来转发
- 在主机上通过开启代理 ARP 功能来实现 ARP 应答,使得 ARP 广播被抑制在主机上,抑制了广播风暴,也不会有 ARP 表膨胀的问题
k8s-master-1的busybox-77649b9c55-r6bx9
尝试ping k8s-node-1的busybox-77649b9c55-7d27b
# 查看k8s-master-1的busybox容器mac信息(为空)
/ # ip neigh show
# k8s-master-1的busybox 尝试ping k8s-node-1的busybox
/ # ping -c 1 172.16.109.65
PING 172.16.109.65 (172.16.109.65): 56 data bytes
64 bytes from 172.16.109.65: seq=0 ttl=62 time=1.603 ms
--- 172.16.109.65 ping statistics ---
1 packets transmitted, 1 packets received, 0% packet loss
round-trip min/avg/max = 1.603/1.603/1.603 ms
# 查看ARP信息
/ # arp -n
? (169.254.1.1) at ee:ee:ee:ee:ee:ee [ether] on eth0
# 查看k8s-master-1 busybox 当前网卡IP
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue
link/ether 86:9c:03:9e:db:9f brd ff:ff:ff:ff:ff:ff
inet 172.16.196.1/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::849c:3ff:fe9e:db9f/64 scope link
valid_lft forever preferred_lft forever
# 查看k8s-master-1路由信息
[root@k8s-master-1 ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.2 0.0.0.0 UG 100 0 0 ens160
172.16.109.64 192.168.0.12 255.255.255.192 UG 0 0 0 tunl0
172.16.196.0 0.0.0.0 255.255.255.192 U 0 0 0 * # 路由屏蔽,这里是把网段路由那些借助路由黑洞给屏蔽了
172.16.196.1 0.0.0.0 255.255.255.255 UH 0 0 0 cali12242800409
172.16.196.2 0.0.0.0 255.255.255.255 UH 0 0 0 calif57336c1ec9
192.168.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens160
k8s-master-1的busybox-77649b9c55-r6bx9
尝试ping k8s-node-1的busybox-77649b9c55-7d27b
整体流程数据报文流程如下:
- 由于
172.16.109.65
与当前172.16.196.1
属于不同的网段,由于跨网段目的MAC地址为网关169.254.1.1
的MAC地址,在获取网关的MAC地址时,由于veth-pair特效,eth0(容器)->cali12242800409(宿主机)
,宿主机的cali12242800409的网卡开启了ARP代理(ARP欺骗)
会将MAC地址:ee:ee:ee:ee:ee:ee
返回给容器 - 当获取到MAC地址后,构建数据报文:src: 172.16.196.1,dst: 172.16.109.65 src_mac: 86:9c:03:9e:db:9f dst_mac: ee:ee:ee:ee:ee:ee,此时容器查询本机路由规则发现命中
默认网关路由
,将数据报文丢给eth0,然后基于veth-pair设备对特性,数据报文到达宿主机的cali12242800409
网卡
[root@k8s-master-1 ~]# tcpdump -i cali12242800409 icmp -e -Nnnvl
dropped privs to tcpdump
tcpdump: listening on cali12242800409, link-type EN10MB (Ethernet), snapshot length 262144 bytes
19:07:34.159492 86:9c:03:9e:db:9f > ee:ee:ee:ee:ee:ee, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 63022, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.196.1 > 172.16.109.65: ICMP echo request, id 29, seq 0, length 64
19:07:34.161751 ee:ee:ee:ee:ee:ee > 86:9c:03:9e:db:9f, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 62, id 35773, offset 0, flags [none], proto ICMP (1), length 84)
172.16.109.65 > 172.16.196.1: ICMP echo reply, id 29, seq 0, length 64
- 数据报文到达k8s-master-1的
cali12242800409
后进行路由匹配,此时会匹配到172.16.109.64 192.168.0.12 255.255.255.192 UG 0 0 0 tunl0
规则,将数据报文发送给tun10网卡。tunl0是一种ip隧道设备,当ip包进入该设备后,会被Linux中的ipip驱动将该ip包直接封装在宿主机网络的ip包中,然后发送到k8s-node-1的宿主机,我们在k8s-master-1的busybox ping k8s-node-1的busybox期间对k8s-master-1的tunl0网卡进行抓包
[root@k8s-master-1 ~]# tcpdump -i tunl0 -eNnnvl
dropped privs to tcpdump
tcpdump: listening on tunl0, link-type RAW (Raw IP), snapshot length 262144 bytes
19:17:08.330978 ip: (tos 0x0, ttl 63, id 12220, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.196.1 > 172.16.109.65: ICMP echo request, id 32, seq 0, length 64
19:17:08.332651 ip: (tos 0x0, ttl 63, id 62324, offset 0, flags [none], proto ICMP (1), length 84)
172.16.109.65 > 172.16.196.1: ICMP echo reply, id 32, seq 0, length 64
- 数据报文经tunl0处理过后,会进行报文封装,加了一层传输层的封包。然后数据报文发送给宿主机ens160物理网卡,对k8s-master-1宿主机ens160进行抓包
[root@k8s-master-1 ~]# tcpdump -eni ens160 | grep -i icmp
# 或者使用这种方式抓IPIP报文
[root@k8s-master-1 ~]# tcpdump -i ens160 "ip proto 4" -ennvv
dropped privs to tcpdump
tcpdump: listening on ens160, link-type EN10MB (Ethernet), snapshot length 262144 bytes
20:03:02.812252 00:0c:29:b1:02:f0 > 00:0c:29:90:fa:e2, ethertype IPv4 (0x0800), length 118: (tos 0x0, ttl 63, id 55134, offset 0, flags [DF], proto IPIP (4), length 104)
192.168.0.11 > 192.168.0.12: (tos 0x0, ttl 63, id 24637, offset 0, flags [DF], proto ICMP (1), length 84)
172.16.196.1 > 172.16.109.65: ICMP echo request, id 43, seq 0, length 64
20:03:02.815592 00:0c:29:90:fa:e2 > 00:0c:29:b1:02:f0, ethertype IPv4 (0x0800), length 118: (tos 0x0, ttl 63, id 16578, offset 0, flags [none], proto IPIP (4), length 104)
192.168.0.12 > 192.168.0.11: (tos 0x0, ttl 63, id 383, offset 0, flags [none], proto ICMP (1), length 84)
172.16.109.65 > 172.16.196.1: ICMP echo reply, id 43, seq 0, length 64
- K8s-master-1 ens160将数据报文发送给k8s-node-1网卡(ens160),
ens160将ipip拆封后,将流量发给tunl0,tunl0再转发给cali1b0be572c83
# 查看k8s-node-1物理网卡信息
[root@k8s-node-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:90:fa:e2 brd ff:ff:ff:ff:ff:ff
altname enp2s0
inet 192.168.0.12/24 brd 192.168.0.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe90:fae2/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 172.16.109.64/32 scope global tunl0
valid_lft forever preferred_lft forever
4: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether c6:db:ff:2e:c2:2b brd ff:ff:ff:ff:ff:ff
inet 10.96.0.1/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.192.105/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.0.10/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
7: cali1b0be572c83@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-e85bba11-5d8e-ec3a-6e51-c68d1d27cb9f
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
172.16.109.65
# 查看路由信息
[root@k8s-node-1 ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.2 0.0.0.0 UG 100 0 0 ens160
172.16.109.64 0.0.0.0 255.255.255.192 U 0 0 0 * # 路由屏蔽,这里是把网段路由那些借助路由黑洞给屏蔽了
172.16.109.65 0.0.0.0 255.255.255.255 UH 0 0 0 cali1b0be572c83
172.16.196.0 192.168.0.11 255.255.255.192 UG 0 0 0 tunl0
192.168.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens160
按照上述分析,网络通信流程如下:
- busybox(k8s-master-1)-> calixxx -> tunl0 -> ens160(k8s-master-1) <----> ens160(k8s-node-1) -> calixxx -> tunl0 -> busybox(k8s-node-1)
- 根据k8s-master-1宿主机中的路由规则中的下一跳,使用tunl0设备将ip包发送到k8s-node-1的宿主机
IP In IP通信分析(同主机)
查看k8s-master-1的网络相关信息,Calico会为每一个pod分配一小段网络,同时会为每个pod创建一个入的ip route规则
# 查看k8s-master-1上的pod信息
╰─ kubectl get pods -A --field-selector spec.nodeName="k8s-master-1" -o custom-columns=NAME:.metadata.name,IP:.status.podIP,HOST:.spec.nodeName
NAME IP HOST
busybox-77649b9c55-r6bx9 172.16.196.1 k8s-master-1
calico-node-q6cv6 192.168.0.11 k8s-master-1
coredns-7c445fd599-glfl5 172.16.196.2 k8s-master-1
# 查看k8s-master-1网卡信息
[root@k8s-master-1 ~]# ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 00:0c:29:b1:02:f0 brd ff:ff:ff:ff:ff:ff
altname enp2s0
inet 192.168.0.11/24 brd 192.168.0.255 scope global noprefixroute ens160
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:feb1:2f0/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: tunl0@NONE: <NOARP,UP,LOWER_UP> mtu 1480 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 172.16.196.0/32 scope global tunl0
valid_lft forever preferred_lft forever
4: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether fa:89:72:a0:30:aa brd ff:ff:ff:ff:ff:ff
inet 10.96.0.1/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.192.105/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.96.0.10/32 scope global kube-ipvs0
valid_lft forever preferred_lft forever
7: cali12242800409@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default # busybox网卡信息
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-d0ea7fe4-4514-1aed-cfd6-fcaf904b837a
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
8: calif57336c1ec9@if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default # coredns网卡信息
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-dd44e8c7-309b-f336-2e19-5c0f0b0f83ab
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
# 查看k8s-master-1的busybox网卡信息
/ # ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue # 对应k8s-master-1的网卡:cali12242800409
link/ether 86:9c:03:9e:db:9f brd ff:ff:ff:ff:ff:ff
inet 172.16.196.1/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::849c:3ff:fe9e:db9f/64 scope link
valid_lft forever preferred_lft forever
# 查看k8s-master-1的busybox路由信息
/ # route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 169.254.1.1 0.0.0.0 UG 0 0 0 eth0
169.254.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
# 查看k8s-master-1的coredns网卡信息
[root@k8s-master-1 ~]# crictl inspect 9e93a905b7d87 | grep -i pid
"pid": 23210,
"pid": 1
"type": "pid"
[root@k8s-master-1 ~]# nsenter -t 23210 -n ip addr
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0@if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1480 qdisc noqueue state UP group default # 对应k8s-master-1的网卡:calif57336c1ec9
link/ether 8e:0a:84:0d:c2:e4 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 172.16.196.2/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::8c0a:84ff:fe0d:c2e4/64 scope link
valid_lft forever preferred_lft forever
# 查看k8s-master-1路由信息
[root@k8s-master-1 ~]# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.0.2 0.0.0.0 UG 100 0 0 ens160
172.16.109.64 192.168.0.12 255.255.255.192 UG 0 0 0 tunl0 # 去往pod网络的下一跳是k8s-node-1的物理机地址,网卡是tunl0
172.16.196.0 0.0.0.0 255.255.255.192 U 0 0 0 * # 路由屏蔽,这里是把网段路由那些借助路由黑洞给屏蔽了
172.16.196.1 0.0.0.0 255.255.255.255 UH 0 0 0 cali12242800409 # busybox pod路由规则
172.16.196.2 0.0.0.0 255.255.255.255 UH 0 0 0 calif57336c1ec9 # coredns pod路由规则
192.168.0.0 0.0.0.0 255.255.255.0 U 100 0 0 ens160
当k8s-master-1的busybox(172.16.196.1
)ping k8s-master-1的coredns(172.16.196.2
)主要流程如下:
- 172.16.196.1和172.16.196.2属于同一个网段,当是我们查看busybox的路由信息发现并没有
172.16.196.0
的网段路由信息,所以会去默认网关169.254.1.1获取MAC地址信息,由于busybox的对端网卡:cali12242800409开启了ARP代理,所以会返回ee:ee:ee:ee:ee:ee
。然后封装数据报文:src_addr: 172.16.196.1 dst_addr: 172.16.196.2 src_mac: 86:9c:03:9e:db:9f dst_mac: ee:ee:ee:ee:ee:ee , 该数据报文会被送到宿主机的cali12242800409 - 当k8s-master-1的宿主机网卡cali12242800409收到数据报文后,查询路由信息,匹配到:
172.16.196.2 0.0.0.0 255.255.255.255 UH 0 0 0 calif57336c1ec9
。然后将数据报文交给calif57336c1ec9网卡(coredns容器eth0的对端网卡)
更多推荐
所有评论(0)