学习kubernetes(三):部署非IPIP模式的Calico的k8s测试环境
前面《学习kubernetes(二):kubeasz部署基于Calico的k8s测试环境》是按照默认步骤部署的calico网络,是IPIP模式,就是说跨node的通信是通过tunl0封装成IP-in-IP形式,本质上跟Flannel还是很类似的,并未将docker的IP暴露出来。下面记录一下关掉IPIP之后的部署验证过程环境准备参考前文在最后ansible运行playbook之前,一定...
前面《学习kubernetes(二):kubeasz部署基于Calico的k8s测试环境》是按照默认步骤部署的calico网络,是IPIP模式,就是说跨node的通信是通过tunl0封装成IP-in-IP形式,本质上跟Flannel还是很类似的,并未将docker的IP暴露出来。下面记录一下关掉IPIP之后的部署验证过程
环境准备
- 参考前文
- 在最后ansible运行playbook之前,一定要将IPIP的值从Always改为off
[root@k8s-master ansible]# cat /etc/ansible/roles/calico/defaults/main.yml | grep IPIP
# 设置 CALICO_IPV4POOL_IPIP=“off”,可以提高网络性能,条件限制详见 docs/setup/calico.md
CALICO_IPV4POOL_IPIP: "off"
[root@k8s-master ansible]#
部署
- 执行所有的playbook
- 部署完毕后查看node状态
[root@k8s-master ansible]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.122.135 Ready,SchedulingDisabled master 4m49s v1.17.2
192.168.122.143 Ready node 3m19s v1.17.2
192.168.122.198 Ready node 3m19s v1.17.2
- 解决SchedulingDisabled的问题
[root@k8s-master ansible]# kubectl uncordon 192.168.122.135
node/192.168.122.135 uncordoned
[root@k8s-master ansible]#
[root@k8s-master ansible]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
192.168.122.135 Ready master 5m19s v1.17.2
192.168.122.143 Ready node 3m49s v1.17.2
192.168.122.198 Ready node 3m49s v1.17.2
[root@k8s-master ansible]#
- 查看BGP peer状态,此时是full-mesh连接,也就是三个节点彼此两两都要建立BGP邻居
[root@k8s-node-1 ~]# ln -s /opt/kube/bin/calicoctl /usr/local/sbin/calicoctl
[root@k8s-node-1 ~]# calicoctl node status
Calico process is running.
IPv4 BGP status
+-----------------+-------------------+-------+----------+-------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+-----------------+-------------------+-------+----------+-------------+
| 192.168.122.135 | node-to-node mesh | up | 09:40:23 | Established |
| 192.168.122.198 | node-to-node mesh | up | 09:40:24 | Established |
+-----------------+-------------------+-------+----------+-------------+
IPv6 BGP status
No IPv6 peers found.
[root@k8s-node-1 ~]#
- 启动四个busybox的pod,用来验证跨node通信
[root@k8s-master ansible]# kubectl run test --image=busybox --replicas=4 sleep 30000
kubectl run --generator=deployment/apps.v1 is DEPRECATED and will be removed in a future version. Use kubectl run --generator=run-pod/v1 or kubectl create instead.
deployment.apps/test created
[root@k8s-master ansible]#
[root@k8s-master ansible]# kubectl get pod -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
test-7b48b75784-2bz2g 1/1 Running 0 14s 172.20.140.66 192.168.122.198 <none> <none>
test-7b48b75784-7cxls 1/1 Running 0 14s 172.20.235.193 192.168.122.135 <none> <none>
test-7b48b75784-hdx2m 1/1 Running 0 14s 172.20.109.67 192.168.122.143 <none> <none>
test-7b48b75784-qnc6m 1/1 Running 0 14s 172.20.235.192 192.168.122.135 <none> <none>
[root@k8s-master ansible]#
查看
- 此时发现在node上不再有tunnel0接口
[root@k8s-node-1 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:28:0f:64 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.143/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
valid_lft 2927sec preferred_lft 2927sec
inet6 fe80::bd31:baa6:a345:4bf1/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::b2b6:fc3a:4364:85a9/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::e489:9d94:404b:2b9a/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:3f:d0:36:9b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether b2:30:6c:38:be:b6 brd ff:ff:ff:ff:ff:ff
5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 12:75:4a:0b:d9:4e brd ff:ff:ff:ff:ff:ff
inet 10.68.0.1/32 brd 10.68.0.1 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.0.2/32 brd 10.68.0.2 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.173.81/32 brd 10.68.173.81 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.172.108/32 brd 10.68.172.108 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.145.230/32 brd 10.68.145.230 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.142.144/32 brd 10.68.142.144 scope global kube-ipvs0
valid_lft forever preferred_lft forever
6: cali3720ab0e5c8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
7: cali58c377b0851@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
8: cali9682b7d9b6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
9: calidd63611da70@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
[root@k8s-node-1 ~]#
- busybox的IP直接就出现在node的路由表中
[root@k8s-node-1 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 52:54:00:28:0f:64 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.143/24 brd 192.168.122.255 scope global noprefixroute dynamic eth0
valid_lft 2927sec preferred_lft 2927sec
inet6 fe80::bd31:baa6:a345:4bf1/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::b2b6:fc3a:4364:85a9/64 scope link tentative noprefixroute dadfailed
valid_lft forever preferred_lft forever
inet6 fe80::e489:9d94:404b:2b9a/64 scope link noprefixroute
valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
link/ether 02:42:3f:d0:36:9b brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever
4: dummy0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether b2:30:6c:38:be:b6 brd ff:ff:ff:ff:ff:ff
5: kube-ipvs0: <BROADCAST,NOARP> mtu 1500 qdisc noop state DOWN group default
link/ether 12:75:4a:0b:d9:4e brd ff:ff:ff:ff:ff:ff
inet 10.68.0.1/32 brd 10.68.0.1 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.0.2/32 brd 10.68.0.2 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.173.81/32 brd 10.68.173.81 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.172.108/32 brd 10.68.172.108 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.145.230/32 brd 10.68.145.230 scope global kube-ipvs0
valid_lft forever preferred_lft forever
inet 10.68.142.144/32 brd 10.68.142.144 scope global kube-ipvs0
valid_lft forever preferred_lft forever
6: cali3720ab0e5c8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
7: cali58c377b0851@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
8: cali9682b7d9b6c@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
9: calidd63611da70@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
[root@k8s-node-1 ~]#
- 路由表上
- 本地的pod的主机地址,通过calixxx的veth-pair可达
- 其它node上的pod,node的IP作为下一跳,目的网段是calico分配给node的可用地址段
[root@k8s-master ~]# ETCDCTL_API=3 etcdctl --endpoints="http://127.0.0.1:2379" get --prefix /calico/ipam/v2/host
/calico/ipam/v2/host/k8s-master/ipv4/block/172.20.235.192-26
{"state":"confirmed"}
/calico/ipam/v2/host/k8s-node-1/ipv4/block/172.20.109.64-26
{"state":"confirmed"}
/calico/ipam/v2/host/k8s-node-2/ipv4/block/172.20.140.64-26
{"state":"confirmed"}
[root@k8s-master ~]#
[root@k8s-node-1 ~]# ip route
default via 192.168.122.1 dev eth0 proto dhcp metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.20.109.64 dev cali3720ab0e5c8 scope link
blackhole 172.20.109.64/26 proto bird
172.20.109.65 dev cali58c377b0851 scope link
172.20.109.66 dev cali9682b7d9b6c scope link
172.20.109.67 dev calidd63611da70 scope link
172.20.140.64/26 via 192.168.122.198 dev eth0 proto bird
172.20.235.192/26 via 192.168.122.135 dev eth0 proto bird
192.168.122.0/24 dev eth0 proto kernel scope link src 192.168.122.143 metric 100
[root@k8s-node-1 ~]#
验证互通
- 从master上的busybox去ping node-2上的busybox,并在node-2的eth0端口抓包
# master上busybox发起ping
[root@k8s-master ~]# kubectl exec -it test-7b48b75784-7cxls ping 172.20.140.66
PING 172.20.140.66 (172.20.140.66): 56 data bytes
64 bytes from 172.20.140.66: seq=0 ttl=62 time=0.846 ms
64 bytes from 172.20.140.66: seq=1 ttl=62 time=0.496 ms
64 bytes from 172.20.140.66: seq=2 ttl=62 time=0.517 ms
# node-2上抓包,发现就是普通的ICMP报文,没有经过封装
[root@k8s-node-2 ~]# tcpdump -i eth0 -ennX host 172.20.235.193
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:45:07.927559 52:54:00:b9:d6:16 > 52:54:00:c2:6f:1e, ethertype IPv4 (0x0800), length 98: 172.20.235.193 > 172.20.140.66: ICMP echo request, id 2816, seq 0, length 64
0x0000: 4500 0054 c3f5 4000 3f01 a786 ac14 ebc1 E..T..@.?.......
0x0010: ac14 8c42 0800 b452 0b00 0000 6de8 cac4 ...B...R....m...
0x0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0050: 0000 0000 ....
09:45:07.927765 52:54:00:c2:6f:1e > 52:54:00:b9:d6:16, ethertype IPv4 (0x0800), length 98: 172.20.140.66 > 172.20.235.193: ICMP echo reply, id 2816, seq 0, length 64
0x0000: 4500 0054 80f6 0000 3f01 2a86 ac14 8c42 E..T....?.*....B
0x0010: ac14 ebc1 0000 bc52 0b00 0000 6de8 cac4 .......R....m...
0x0020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0030: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0040: 0000 0000 0000 0000 0000 0000 0000 0000 ................
0x0050: 0000 0000
- 跟踪通信,container(master)->eth0(master)->eth0(node-2)->container(node-2)
[root@k8s-master ~]# kubectl exec -it test-7b48b75784-7cxls traceroute 172.20.140.66
traceroute to 172.20.140.66 (172.20.140.66), 30 hops max, 46 byte packets
1 192-168-122-135.kubernetes.default.svc.cluster.local (192.168.122.135) 0.018 ms 0.006 ms 0.004 ms
2 192.168.122.198 (192.168.122.198) 0.302 ms 0.398 ms 0.229 ms
3 172.20.140.66 (172.20.140.66) 0.274 ms 0.488 ms 0.339 ms
[root@k8s-master ~]#
小结
直接复制粘贴了
从上图可以看出,当容器创建时,calico为容器生成veth pair,一端作为容器网卡加入到容器的网络命名空间,并设置IP和掩码,一端直接暴露在宿主机上,
并通过设置路由规则,将容器IP暴露到宿主机的通信路由上。于此同时,calico为每个主机分配了一段子网作为容器可分配的IP范围,这样就可以根据子网的
CIDR为每个主机生成比较固定的路由规则。
当容器需要跨主机通信时,主要经过下面的简单步骤:
1)容器流量通过veth pair到达宿主机的网络命名空间上。
2)根据容器要访问的IP所在的子网CIDR和主机上的路由规则,找到下一跳要到达的宿主机IP。
3)流量到达下一跳的宿主机后,根据当前宿主机上的路由规则,直接到达对端容器的veth pair插在宿主机的一端,最终进入容器。
从上面的通信过程来看,跨主机通信时,整个通信路径完全没有使用NAT或者UDP封装,性能上的损耗确实比较低。但正式由于calico的通信机制是完全基于三层的,这种机制也带来了一些缺陷,例如:
1)calico目前只支持TCP、UDP、ICMP、ICMPv6协议,如果使用其他四层协议(例如NetBIOS协议),建议使用weave、原生overlay等其他overlay网络实现。
2)基于三层实现通信,在二层上没有任何加密包装,因此只能在私有的可靠网络上使用。
原文链接:https://blog.csdn.net/ccy19910925/article/details/82423452
存疑
- 在部署的过程中遇到一个问题,就是ASN的值显示为unknown
[root@k8s-master ansible]# calicoctl get node -o wide
NAME ASN IPV4 IPV6
k8s-master (unknown) 192.168.122.27/24
k8s-node-1 (unknown) 192.168.122.212/24
k8s-node-2 (unknown) 192.168.122.141/24
- 据说是因为使用了默认的AS号,所以这里显示unknown,但是不妨碍BGP邻居的建立
- 另外发现没有默认的bgpconfig
[root@k8s-master ansible]# calicoctl get bgpconfig default
Failed to get resources: resource does not exist: BGPConfiguration(default) with error: <nil>
[root@k8s-master ansible]#
[root@k8s-master templates]# calicoctl get bgpPeer -o yaml
apiVersion: projectcalico.org/v3
items: []
kind: BGPPeerList
metadata:
resourceVersion: "156403"
[root@k8s-master templates]#
- 按照网上的文档手动添加一个
[root@k8s-master ansible]# cat << EOF | calicoctl create -f -
apiVersion: projectcalico.org/v3
kind: BGPConfiguration
metadata:
name: default
spec:
logSeverityScreen: Info
nodeToNodeMeshEnabled: false
asNumber: 63400
EOF
Successfully created 1 'BGPConfiguration' resource(s)
[root@k8s-master ansible]#
- 结果悲剧了,BGP的邻居都不见了,没有路由同步了,跨node的docker之间的ping也不通了
[root@k8s-master ansible]# calicoctl node status
Calico process is running.
IPv4 BGP status
No IPv4 peers found.
IPv6 BGP status
No IPv6 peers found.
[root@k8s-master ansible]#
- 没有找到好的恢复的方法,只能重新装机部署一遍
更多推荐
所有评论(0)