作者:张华 发表于:2021-01-25
版权声明:可以任意转载,转载时请务必以超链接形式标明文章原始出处和作者信息及本版权声明

如OpenStack中的Secrity Group可以用来在传统服务应用架构中来实现层与层这的网络访问规则,但对于容器化的服务场景,一个容器一个应用粒度更小,且主机节点的个数和 IP地址都 是快速变化的,这样分层防火墙的思路不再可行。Calico就是这样一个工具,通过修改每个节点上的iptables与路由来实现容器间数据的路由和访问控制,并通过etcd协调节点配置信息。 K8s Network Policy可使用Calico作为CNI实现隔离性,只有匹配规则的流量才能进入pod,同理只有匹配规则的流量才可以离开pod。
Canal是Flannel与Calico的结合,访问控制部分由Calico(calico-node -felix)实现, 网络部分仍由Flannel实现(Calico有BGP路由与IPIP隧道两种,Flannel则有Vxlan隧道)。
Calico的网络规则的用法可参见(https://www.open-open.com/news/view/1a7c496), k8s通过CNI集成Canal能使用Calico实现的网络规则部分,但用法是怎样的呢(可参见-https://www.jianshu.com/p/331235d8bcbb):

  • 通过kubectl client创建NetworkPolicy资源
  • calico的policy-controller(calico-kube-controllers集成了calicoctl用于写policy)监听networkpolicy资源,获取到后写入calico的etcd数据库
  • node上calico-felix从etcd数据库中获取policy资源,调用iptables做相应配置。

Calico policy架构

在node上最终生成的最终iptables如下:
在这里插入图片描述

测试环境信息

kubernetes-master/0, kubernetes-worker/0, kubernetes-worker/1的IP如下:

kubernetes-master/0*      active    idle   3        10.5.2.205      6443/tcp        Kubernetes master running.
  canal/2*                active    idle            10.5.2.205                      Flannel subnet 10.1.49.1/24
  containerd/2*           active    idle            10.5.2.205                      Container runtime available
kubernetes-worker/0*      active    idle   4        10.5.3.197      80/tcp,443/tcp  Kubernetes worker running.
  canal/0                 active    idle            10.5.3.197                      Flannel subnet 10.1.94.1/24
  containerd/0            active    idle            10.5.3.197                      Container runtime available
kubernetes-worker/1       active    idle   5        10.5.4.4        80/tcp,443/tcp  Kubernetes worker running.
  canal/1                 active    idle            10.5.4.4                        Flannel subnet 10.1.3.1/24
  containerd/1            active    idle            10.5.4.4                        Container runtime available

$ kubectl get pods -o wide
NAME                            READY   STATUS    RESTARTS   AGE    IP          NODE                NOMINATED NODE   READINESS GATES
ubuntu-debug-794979f648-4dj8w   1/1     Running   1          118m   10.1.3.14   juju-11ff05-k8s-5   <none>           <none>
ubuntu-debug-794979f648-fwdmv   1/1     Running   1          118m   10.1.94.7   juju-11ff05-k8s-4   <none>           <none>

canal中的flannel提供pod-to-pod通信

flannel内的tunnel网段是10.1.0.0/16,vxlan类型:

# etcdctl --cert-file /etc/ssl/flannel/client-cert.pem --key-file /etc/ssl/flannel/client-key.pem --ca-file /etc/ssl/flannel/client-ca.pem -endpoints=https://10.5.1.44:2379 get /coreos.com/network/config
{"Network": "10.1.0.0/16", "Backend": {"Type": "vxlan"}}

flannel为kubernetes-master/0, kubernetes-worker/0, kubernetes-worker/1分配的subnet分别是:

root@juju-11ff05-k8s-4:~# etcdctl --cert-file /etc/ssl/flannel/client-cert.pem --key-file /etc/ssl/flannel/client-key.pem --ca-file /etc/ssl/flannel/client-ca.pem -endpoints=https://10.5.1.44:2379 ls /coreos.com/network/subnets/
/coreos.com/network/subnets/10.1.94.0-24 (kubernetes-worker/0)
/coreos.com/network/subnets/10.1.3.0-24  (kubernetes-worker/1)
/coreos.com/network/subnets/10.1.49.0-24 (kubernetes-master/0)

flannel是根据subnet.env分配的,kubernetes-master/0, kubernetes-worker/0, kubernetes-worker/1均应有下列配置,下面只是粘出了kubernetes-worker/0上的例子:

root@juju-11ff05-k8s-4:~# cat /run/flannel/subnet.env 
FLANNEL_NETWORK=10.1.0.0/16
FLANNEL_SUBNET=10.1.94.1/24
FLANNEL_MTU=8908
FLANNEL_IPMASQ=true

root@juju-11ff05-k8s-4:~# ip addr show flannel.1 |grep global
    inet 10.1.94.0/32 scope global flannel.1

防火墙应该允许该子网的流量进入

#on kubernetes-worker/0, it should ALLOW 10.1.94.0/24
root@juju-11ff05-k8s-4:~# iptables-save |grep 10.1.94 |grep POST
-A POSTROUTING ! -s 10.1.0.0/16 -d 10.1.94.0/24 -j RETURN
# iptables -t nat -nvL |grep 'Chain POSTROUTING' -A9
Chain POSTROUTING (policy ACCEPT 1447 packets, 89820 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 1885  119K KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
    8   460 fan-egress  all  --  *      *       252.0.0.0/8          0.0.0.0/0           
  183 11249 RETURN     all  --  *      *       10.1.0.0/16          10.1.0.0/16         
    6   360 MASQUERADE  all  --  *      *       10.1.0.0/16         !224.0.0.0/4         
    0     0 RETURN     all  --  *      *      !10.1.0.0/16          10.1.94.0/24        
    0     0 MASQUERADE  all  --  *      *      !10.1.0.0/16          10.1.0.0/16 

另外,底层docker或contained也应该将etcd中为每个节点配置的subnet加载进去:

  • 如果使用docker, 容器内的流量会先到docker0, 然后根据下列路由到flannel.1
  • 如果使用containerd,是通过下列的/etc/cni/net.d/10-canal.conflist文件将subnet设置进flannel的。

canal中的Calico/Felix提供网络访问规则

本来calico的felix(通过calico-node -felix运行)是运行在每个节点上负责配置路由及网络访问规则的,现在路由这块不做还是由Flannel来做,所以主要用到felix的网络访问规则。calio的felix的访问规则这块由下列iptables实现(详见Calico网络模型 - https://www.cnblogs.com/menkeyi/p/11364977.html),但尚不清楚canal有没有作其他更改。
在这里插入图片描述

root@juju-11ff05-k8s-4:~# route -n |grep flannel
10.1.3.0        10.1.3.0        255.255.255.0   UG    0      0        0 flannel.1
10.1.49.0       10.1.49.0       255.255.255.0   UG    0      0        0 flannel.1

接着流量会被封装成vxlan tunnel送到对方endpoint.
Calico如何管网络策略的还没研究清楚(我估计上节中的充许防火墙规则就是这块设置的,当然只是猜测)。

root@juju-11ff05-k8s-4:~# ip addr show |grep cali -A1
6: cali8215eb6fd4a@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-262d6f97-ad60-b48b-85e1-ee180b620c30
--
7: cali105686d7bac@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netns cni-09304b68-97b3-c72b-2bd6-b47d9f626890

root@juju-11ff05-k8s-4:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         10.5.0.1        0.0.0.0         UG    100    0        0 ens3
10.1.3.0        10.1.3.0        255.255.255.0   UG    0      0        0 flannel.1
10.1.49.0       10.1.49.0       255.255.255.0   UG    0      0        0 flannel.1
10.1.94.6       0.0.0.0         255.255.255.255 UH    0      0        0 cali105686d7bac
10.1.94.7       0.0.0.0         255.255.255.255 UH    0      0        0 cali8215eb6fd4a
10.5.0.0        0.0.0.0         255.255.0.0     U     0      0        0 ens3
169.254.169.254 10.5.0.1        255.255.255.255 UGH   100    0        0 ens3
252.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 fan-252
root@juju-11ff05-k8s-4:~# ip netns exec cni-09304b68-97b3-c72b-2bd6-b47d9f626890 route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         169.254.1.1     0.0.0.0         UG    0      0        0 eth0
169.254.1.1     0.0.0.0         255.255.255.255 UH    0      0        0 eth0
root@juju-11ff05-k8s-4:~# ip netns exec cni-262d6f97-ad60-b48b-85e1-ee180b620c30 route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         169.254.1.1     0.0.0.0         UG    0      0        0 eth0
169.254.1.1     0.0.0.0         255.255.255.255 UH    0      0        0 eth0


root@juju-11ff05-k8s-4:~# ctr c ls
CONTAINER      IMAGE                                              RUNTIME                  
calico-node    rocks.canonical.com:443/cdk/calico/node:v3.10.1    io.containerd.runc.v2    

root@juju-11ff05-k8s-4:~# ip netns
cni-09304b68-97b3-c72b-2bd6-b47d9f626890 (id: 1)
cni-262d6f97-ad60-b48b-85e1-ee180b620c30 (id: 0)

root@juju-11ff05-k8s-4:~# ip netns exec cni-09304b68-97b3-c72b-2bd6-b47d9f626890 ip addr show |grep eth0
3: eth0@if7: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    inet 10.1.94.6/32 scope global eth0
root@juju-11ff05-k8s-4:~# ip netns exec cni-262d6f97-ad60-b48b-85e1-ee180b620c30 ip addr show |grep eth0
3: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 
    inet 10.1.94.7/32 scope global eth0

root@juju-11ff05-k8s-4:~# cat /etc/cni/net.d/10-canal.conflist 
{
  "name": "cdk-canal",
  "cniVersion": "0.3.0",
  "plugins": [
    {
      "type": "calico",
      "etcd_endpoints": "https://10.5.1.44:2379",
      "etcd_key_file": "/opt/calicoctl/etcd-key",
      "etcd_cert_file": "/opt/calicoctl/etcd-cert",
      "etcd_ca_cert_file": "/opt/calicoctl/etcd-ca",
      "log_level": "info",
      "ipam": {
        "type": "host-local",
        "subnet": "10.1.94.1/24"
      },
      "policy": {
        "type": "k8s"
      },
      "kubernetes": {
        "kubeconfig": "/root/cdk/kubeconfig"
      }
    },
    {
      "type": "portmap",
      "capabilities": {"portMappings": true},
      "snat": true
    }
  ]
}

iptables提供Service-to-Pod通信

以dashboard为例:

$ kubectl get services -A |grep kubernetes-dashboard
kubernetes-dashboard              dashboard-metrics-scraper                ClusterIP   10.152.183.92    <none>        8000/TCP                 3d22h
kubernetes-dashboard              kubernetes-dashboard                     ClusterIP   10.152.183.89    <none>        443/TCP                  3d22h
$ kubectl describe --namespace kubernetes-dashboard service kubernetes-dashboard |grep -E 'IPs|Endpoints|Port'
IPs:               10.152.183.89
Port:              <unset>  443/TCP
TargetPort:        8443/TCP
Endpoints:         10.1.3.16:8443
$ kubectl get pods -A -o wide |grep dashboard
kubernetes-dashboard              dashboard-metrics-scraper-74757fb5b7-9rqxd                1/1     Running   1          3d22h   10.1.3.17    juju-11ff05-k8s-5   <none>           <none>
kubernetes-dashboard              kubernetes-dashboard-64f87676d4-s26bs                     1/1     Running   1          3d22h   10.1.3.16    juju-11ff05-k8s-5   <none>           <none>

# 如果有多个endpoint也就会有多个KUBE-SEP-xxx实实现LB, 其中MARK-MASQ用于set mark
root@juju-11ff05-k8s-4:~# iptables-save |grep kubernetes-dashboard
-A KUBE-SERVICES ! -s 10.1.0.0/16 -d 10.152.183.89/32 -p tcp -m comment --comment "kubernetes-dashboard/kubernetes-dashboard cluster IP" -m tcp --dport 443 -j KUBE-MARK-MASQ #egress
-A KUBE-SERVICES -d 10.152.183.89/32 -p tcp -m comment --comment "kubernetes-dashboard/kubernetes-dashboard cluster IP" -m tcp --dport 443 -j KUBE-SVC-CEZPIJSAUFW5MYPQ       #ingress
-A KUBE-SVC-CEZPIJSAUFW5MYPQ -m comment --comment "kubernetes-dashboard/kubernetes-dashboard" -j KUBE-SEP-RMJCCMAWKQ3DCGZP
-A KUBE-SEP-RMJCCMAWKQ3DCGZP -s 10.1.3.16/32 -m comment --comment "kubernetes-dashboard/kubernetes-dashboard" -j KUBE-MARK-MASQ                      #egress
-A KUBE-SEP-RMJCCMAWKQ3DCGZP -p tcp -m comment --comment "kubernetes-dashboard/kubernetes-dashboard" -m tcp -j DNAT --to-destination 10.1.3.16:8443  #ingress

相关服务

systemctl status flannel
systemctl status calico-node.service
systemctl status snap.kubelet.daemon.service

附录 - 一个客户问题

一个客户有一个k8s over vSphere managed by juju 环境,报告使用 goldpinger发现pod间网络不通。
先是遇到问题https://bugs.launchpad.net/juju/+bug/1831244,juju controllers与vSphere之间存在用户名和密码的问题,这样"juju status --format yaml"能看到controller处于suspended状态(注意:只有带–format yaml才能看到得)。
解决1831244之后就可以正常创建新节点了,将那个失败的keycloak pod迁移到新节点,问题依然存在,'kubectl logs’看到keycloak失败原因是: ‘Caused by: java.net.UnknownHostException: postgres’
显然,keycloak依赖于postgres service, 存在dns问题。
coredns并没有从旧节新移到新节点,旧节点上存在一个问题, canal的flannel使用的subnet(/run/flannel/subnet.env)与canal的calico使用的subnet(/etc/cni/net.d/10-canal.conflist)不一致,这样导致旧节点上的iptables下列倒数第二行就弄错了,没有充许flannel的数据包。

grep -A9 "Chain POSTROUTING" sos_commands/networking/iptables_-t_nat_-nvL
Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
1411K 88M cali-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* cali:O3lYWMrLQYEMJtB5 */
1378K 86M CNI-HOSTPORT-MASQ all -- * * 0.0.0.0/0 0.0.0.0/0 /* CNI portfwd requiring masquerade */
1376K 86M KUBE-POSTROUTING all -- * * 0.0.0.0/0 0.0.0.0/0 /* kubernetes postrouting rules */
800K 50M RETURN all -- * * 172.17.240.0/20 172.17.240.0/20
0 0 MASQUERADE all -- * * 172.17.240.0/20 !224.0.0.0/4
0 0 RETURN all -- * * !172.17.240.0/20 172.17.247.0/24
521K 31M MASQUERADE all -- * * !172.17.240.0/20 172.17.240.0/20

下列语句可以查询etcd中每个节点的subnet配置:

juju run -u kubernetes-worker/<N> etcdctl --cert-file /etc/ssl/flannel/client-cert.pem --key-file /etc/ssl/flannel/client-key.pem --ca-file /etc/ssl/flannel/client-ca.pem -endpoints=https://10.1.234.195:2379,https://10.1.234.196:2379,https://10.1.234.203:2379 ls /coreos.com/network/subnets/

需要先从etcd里删除它 - rm /coreos.com/network/subnets/172.17.247.0-24
然后重它systemctl restart flannel.service, 接着将 /run/flannel/subnet.env的subnet更新为正确的subnet, 之后通过“ip a s flannel.1 ”确认。最后:
systemctl restart calico-node.service
systemctl restart snap.kubelet.daemon.service

calicoctl测试

juju ssh kubernetes-worker/0 -- sudo -s
#https://docs.projectcalico.org/getting-started/clis/calicoctl/install
curl -O -L  https://github.com/projectcalico/calicoctl/releases/download/v3.17.1/calicoctl
cp calicoctl /usr/bin/ && chmod +x /usr/bin/calicoctl
#copy env variable like ETCD_ENDPOINTS from the output of 'ps -ef |grep calicoctl'
ETCD_ENDPOINTS=https://10.5.0.131:2379 ETCD_CA_CERT_FILE=/opt/calicoctl/etcd-ca ETCD_CERT_FILE=/opt/calicoctl/etcd-cert ETCD_KEY_FILE=/opt/calicoctl/etcd-key calicoctl get networkPolicy
ETCD_ENDPOINTS=https://10.5.0.131:2379 ETCD_CA_CERT_FILE=/opt/calicoctl/etcd-ca ETCD_CERT_FILE=/opt/calicoctl/etcd-cert ETCD_KEY_FILE=/opt/calicoctl/etcd-key calicoctl get globalNetworkPolicy

k8s networkpolicy - https://docs.projectcalico.org/security/kubernetes-network-policy
calico networkpolicy - https://docs.projectcalico.org/security/calico-network-policy
cat << EOF | sudo tee calicoPolicyTest.yaml
apiVersion: projectcalico.org/v3
kind: GlobalNetworkPolicy
metadata:
  name: deny-circle-blue
spec:
  selector: k8s-app == 'kubernetes-dashboard'
  ingress:
  - action: Deny
    protocol: TCP
    destination:
      ports:
      - 22
EOF
ETCD_ENDPOINTS=https://10.5.0.131:2379 ETCD_CA_CERT_FILE=/opt/calicoctl/etcd-ca ETCD_CERT_FILE=/opt/calicoctl/etcd-cert ETCD_KEY_FILE=/opt/calicoctl/etcd-key calicoctl get globalNetworkPolicy
#go to the worker which dashboard pos is on (kubectl get pod -A -o wide |grep dash)
iptables-save |grep 2222

在这里插入图片描述

reference

[1] https://courses.academy.tigera.io/courses/course-v1:tigera+CCO-L1+CCO-L1-2020/course/

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐