kubemark模拟5000街道实测记录

记录使用kubemark工具模拟5000个k8s节点。

1. 环境准备

kubemark集群(容器一体机环境)计划模拟5000个k8s节点,在external集群准备20个节点用于创建模拟hollow-node的pod资源。

关于节点数量的计算,主要从以下考虑:

  1. 每个pod资源需求资源:官方给出的建议是每个pod 0.1 CPU核心和220MB内存。从实际测试来看,这个资源需求可以更小一些,大约0.5倍资源量。

  2. 每个节点初始化时配置一个C段IP地址(除了一些ds的pod,每个节点安装250个hollow pod进行规划)

  3. 设置节点可以调度的pods数量500个(理论上大于可以分配的IP地址253个就行)

准备一个与kubemark集群(被测试集群)相同版本的external k8s集群,部署kubemark pod用于模拟hollow node:

# kubemark集群。由三台物理服务器组成,配置较高
[root@cluster54 ~]# kubectl get node -o wide
cluster54      Ready    control-plane,master   12d   v1.27.6   192.168.101.54   <none>        openEuler 22.03 (LTS-SP1)     5.10.0-136.12.0.86.4.hl202.x86_64   containerd://1.7.7-u2
cluster55      Ready    control-plane,master   12d   v1.27.6   192.168.101.55   <none>        openEuler 22.03 (LTS-SP1)     5.10.0-136.12.0.86.4.hl202.x86_64   containerd://1.6.14
cluster56      Ready    control-plane,master   12d   v1.27.6   192.168.101.56   <none>        openEuler 22.03 (LTS-SP1)     5.10.0-136.12.0.86.4.hl202.x86_64   containerd://1.6.14

# 集群信息
[root@cluster54 ~]# kubectl cluster-info
Kubernetes control plane is running at https://apiserver.cluster.local:6443
CoreDNS is running at https://apiserver.cluster.local:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy

To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.

# hosts解析
[root@cluster54 ~]# cat /etc/hosts
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
10.210.10.54 cluster54
10.210.10.55 cluster55
10.210.10.56 cluster56

192.168.101.54 cluster54
192.168.101.55 cluster55
192.168.101.56 cluster56

192.168.101.200 vip.cluster.local
192.168.101.54 apiserver.cluster.local
192.168.101.200 vip.harbor.cloudos

# external集群信息,有四台16C32G的虚拟机组成
[root@k8s-master1 ~]# kubectl get node
NAME            STATUS                     ROLES           AGE   VERSION
k8s-master1     Ready   control-plane   60d   v1.27.6
k8s-node1       Ready   <none>          60d   v1.27.6
k8s-node2       Ready   <none>          60d   v1.27.6
k8s-node3       Ready   <none>          56d   v1.27.6

在上述k8s集群,扩容20个节点(16C32G)用于模拟hollow-node:

# 原有节点设置停止调度
[root@k8s-master1 ~]# kubectl cordon k8s-node1
[root@k8s-master1 ~]# kubectl cordon k8s-node2
[root@k8s-master1 ~]# kubectl cordon k8s-node3
[root@k8s-master1 ~]# kubectl cordon k8s-master1

# 扩容20个hollownode节点
[root@k8s-master1 ~]# kubectl get node
NAME            STATUS                     ROLES           AGE   VERSION
hollownode174   Ready                      <none>          4d    v1.27.6
hollownode175   Ready                      <none>          4d    v1.27.6
hollownode176   Ready                      <none>          4d    v1.27.6
hollownode177   Ready                      <none>          4d    v1.27.6
hollownode178   Ready                      <none>          4d    v1.27.6
hollownode179   Ready                      <none>          4d    v1.27.6
hollownode180   Ready                      <none>          4d    v1.27.6
hollownode181   Ready                      <none>          4d    v1.27.6
hollownode182   Ready                      <none>          4d    v1.27.6
hollownode183   Ready                      <none>          4d    v1.27.6
hollownode184   Ready                      <none>          4d    v1.27.6
hollownode185   Ready                      <none>          4d    v1.27.6
hollownode186   Ready                      <none>          4d    v1.27.6
hollownode187   Ready                      <none>          4d    v1.27.6
hollownode188   Ready                      <none>          4d    v1.27.6
hollownode189   Ready                      <none>          4d    v1.27.6
hollownode190   Ready                      <none>          4d    v1.27.6
hollownode191   Ready                      <none>          4d    v1.27.6
hollownode192   Ready                      <none>          4d    v1.27.6
hollownode193   Ready                      <none>          4d    v1.27.6
k8s-master1     Ready,SchedulingDisabled   control-plane   60d   v1.27.6
k8s-node1       Ready,SchedulingDisabled   <none>          60d   v1.27.6
k8s-node2       Ready,SchedulingDisabled   <none>          60d   v1.27.6
k8s-node3       Ready,SchedulingDisabled   <none>          56d   v1.27.6

# 批量设置label
[root@k8s-master1 ~]# for i in {174..193}; do kubectl label node hollownode$i name=hollow-node; done

2. 创建hollow-node pod

通过在external集群创建pod,pod中会有kubelet通过kubeconfig文件注册到kubemark集群,从而完成节点的模拟。

2.1 external集群

  1. 创建命名空间
[root@k8s-master ~]# kubectl create ns kubemark
  1. 创建configmap和secret
# 在 external cluster创建configmap
[root@k8s-master ~]# kubectl create configmap node-configmap -n kubemark --from-literal=content.type="test-cluster"

# 在 external cluster 上创建secret,其中kubeconfig为kubemark集群(被测试集群)的kubeconfig文件
[root@k8s-master ~]# kubectl create secret generic kubeconfig --type=Opaque --namespace=kubemark --from-file=kubelet.kubeconfig=kubeconfig.kubemark --from-file=kubeproxy.kubeconfig=kubeconfig.kubemark
  1. 创建hollow-node pod

准备下面的yaml文件hollow-node.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: hollow-node
  namespace: kubemark
  labels:
    name: hollow-node
spec:
  replicas: 3
  selector:
    matchLabels:
      name: hollow-node
  template:
    metadata:
      labels:
        name: hollow-node
    spec:
      hostAliases:
      - ip: "192.168.101.54"    # 如果是高可用,则填写集群域名对应的IP地址
        hostnames:
        - "apiserver.cluster.local" # kubemark集群域名
      nodeSelector:
        name: hollow-node
      initContainers:
      - name: init-inotify-limit
        image: busybox
        imagePullPolicy: IfNotPresent
        command: ['sysctl', '-w', 'fs.inotify.max_user_instances=524288']
        securityContext:
          privileged: true
      volumes:
      - name: kubeconfig-volume
        secret:
          secretName: kubeconfig
      - name: containerd
        hostPath:
          path: /run/containerd
      - name: logs-volume
        hostPath:
          path: /var/log
      containers:
      - name: hollow-kubelet
        image: staging-k8s.gcr.io/kubemark:v1.27.6
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 4194
        - containerPort: 10250
        - containerPort: 10255
        env:
        - name: CONTENT_TYPE
          valueFrom:
            configMapKeyRef:
              name: node-configmap
              key: content.type
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        command:
        - /bin/sh
        - -c
        - /kubemark --morph=kubelet --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubelet.kubeconfig  --v=2
        volumeMounts:
        - name: kubeconfig-volume
          mountPath: /kubeconfig
          readOnly: true
        - name: logs-volume
          mountPath: /var/log
        - name: containerd
          mountPath: /run/containerd
        securityContext:
          privileged: true
      - name: hollow-proxy
        image: staging-k8s.gcr.io/kubemark:v1.27.6
        imagePullPolicy: IfNotPresent
        env:
        - name: CONTENT_TYPE
          valueFrom:
            configMapKeyRef:
              name: node-configmap
              key: content.type
        - name: NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        command:
        - /bin/sh
        - -c
        - /kubemark --morph=proxy --name=$(NODE_NAME) --use-real-proxier=false --kubeconfig=/kubeconfig/kubeproxy.kubeconfig  --v=2
        volumeMounts:
        - name: kubeconfig-volume
          mountPath: /kubeconfig
          readOnly: true
        - name: logs-volume
          mountPath: /var/log
        - name: containerd
          mountPath: /run/containerd
      tolerations:
        - key: key
          value: value
          effect: NoSchedule
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: name
                operator: In
                values:
                - hollow-node

创建deploy:kubectl create -f hollow-node.yaml,创建后确认kubemark集群侧已经有3个hollow-node节点注册:

[root@cluster54 ~]# kubectl get node
NAME                           STATUS   ROLES                  AGE   VERSION
cluster54                      Ready    control-plane,master   12d   v1.27.6
cluster55                      Ready    control-plane,master   12d   v1.27.6
cluster56                      Ready    control-plane,master   12d   v1.27.6
hollow-node-7f499b849f-222rm   Ready    <none>                 15h   v1.27.6
hollow-node-7f499b849f-2247k   Ready    <none>                 15h   v1.27.6
hollow-node-7f499b849f-2264p   Ready    <none>                 15h   v1.27.6

如果此时没有节点注册上来,要排查问题。解决后再继续下面的步骤。

  1. 扩容

等待注册流程没问题后,修改副本扩容,我这里直接扩容为5000。当然可以从1000 --> 3000 --> 5000逐步进行副本数量。

kubectl scale -n kubemark deployment.apps/hollow-node --replicas=5000

资源使用情况:

[root@k8s-master1 ~]# kubectl top node
NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
hollownode174   4210m        26%    19804Mi         62%
hollownode175   4432m        27%    19800Mi         62%
hollownode176   4455m        27%    19757Mi         61%
hollownode177   4137m        25%    19712Mi         61%
hollownode178   4185m        26%    19836Mi         62%
hollownode179   3901m        24%    20661Mi         64%
hollownode180   4218m        26%    19757Mi         61%
hollownode181   3653m        22%    19915Mi         61%
hollownode182   3636m        22%    19957Mi         62%
hollownode183   4152m        25%    19796Mi         62%
hollownode184   3620m        22%    19864Mi         61%
hollownode185   4237m        26%    19796Mi         62%
hollownode186   4288m        26%    19827Mi         62%
hollownode187   4321m        27%    19798Mi         62%
hollownode188   4230m        26%    19801Mi         62%
hollownode189   4456m        27%    19690Mi         61%
hollownode190   4308m        26%    19830Mi         62%
hollownode191   4415m        27%    19923Mi         62%
hollownode192   4413m        27%    19775Mi         62%
hollownode193   4083m        25%    19797Mi         62%
k8s-master1     310m         3%     4446Mi          13%
k8s-node1       85m          0%     1449Mi          4%
k8s-node2       69m          0%     1421Mi          4

2.2 kubemark集群

kubemark集群观察节点增长:watch -n 2 "kubectl get node | grep hollow-node | wc -l"

确认5000个节点完成注册:

[root@cluster54 ~]# kubectl get node | grep hollow-node | wc -l
5000

除此之外,kubemark集群会创建很多余hollow-node节点相关的pod:

[root@cluster54 ~]# kubectl get pod -A -o wide | grep hollow-node-7f499b849f-zzr8j
kube-system                    kube-ovn-cni-6kblg       0/1     Init:ErrImagePull          0           16h     192.168.192.168   hollow-node-7f499b849f-zzr8j   <none>       <none>
kube-system                    kube-ovn-pinger-nsf4r    0/1     ErrImagePull               0           16h     192.168.192.168   hollow-node-7f499b849f-zzr8j   <none>       <none>
kube-system                    kube-proxy-clj7h         0/1     ErrImagePull               0           16h     192.168.192.168   hollow-node-7f499b849f-zzr8j   <none>       <none>
kube-system                    ovs-ovn-59t7r            0/1     ErrImagePull               0           16h     192.168.192.168   hollow-node-7f499b849f-zzr8j   <none>       <none>
rook-ceph                      csi-cephfsplugin-nhwp7   1/2     ErrImagePull               0           16h     192.168.192.168   hollow-node-7f499b849f-zzr8j   <none>       <none>
rook-ceph                      csi-rbdplugin-n22jx      1/2     ErrImagePull               0           16h     192.168.192.168   hollow-node-7f499b849f-zzr8j   <none>       <none>

# kubemark集群总的pod数量
[root@cluster54 ~]# kubectl get pod -A | wc -l
60931

2.3 资源占用情况

等待5000个节点模拟完成后,查看环境的资源占用情况:

20个节点的资源占用情况

[root@k8s-master1 ~]# kubectl top node
NAME            CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
hollownode174   4282m        26%    22146Mi         69%
hollownode175   4422m        27%    22152Mi         69%
hollownode176   4329m        27%    22120Mi         69%
hollownode177   4377m        27%    22145Mi         69%
hollownode178   4274m        26%    22188Mi         69%
hollownode179   3790m        23%    21987Mi         68%
hollownode180   4298m        26%    22174Mi         69%
hollownode181   3651m        22%    22204Mi         69%
hollownode182   3754m        23%    22214Mi         69%
hollownode183   4201m        26%    22172Mi         69%
hollownode184   3690m        23%    22154Mi         68%
hollownode185   4375m        27%    22155Mi         69%
hollownode186   4402m        27%    22211Mi         69%
hollownode187   4389m        27%    22149Mi         69%
hollownode188   4258m        26%    22161Mi         69%
hollownode189   4435m        27%    22087Mi         69%
hollownode190   4492m        28%    22158Mi         69%
hollownode191   4508m        28%    22209Mi         69%
hollownode192   4406m        27%    22117Mi         69%
hollownode193   4257m        26%    22205Mi         69%

虚拟化平台侧监控情况:
在这里插入图片描述

3. 清理资源

测试完成后清理测试资源

3.1 kubemark集群

  1. kubemark集群清理hollow-node节点:kubectl get node | grep hollow-node | awk '{print $1}' | xargs kubectl delete node {}

3.2 external集群

  1. 删除hollow-node pod

external集群,删除pod: kubectl scale -n kubemark deployment.apps/hollow-node --replicas=0

只是临时删除pod,后续可以进行扩容再次测试。

  1. 彻底删除

删除deploy:kubectl delete -f hollow-node.yaml

删除configmap:kubectl delete configmap node-configmap -n kubemark

删除secret:kubectl delete secret kubeconfig -n kubemark

删除ns:kubectl create ns kubemark

4. 常见问题

4.1 external集群,pod调度问题,大量pod出于pending状态。

Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  3m1s  default-scheduler  0/24 nodes are available: 20 Too many pods, 4 node(s) were unschedulable. preemption: 0/24 nodes are available: 20 No preemption victims found for incoming pod, 4 Preemption is not helpful for schedul

原因:pod数量超过节点设置的最大pods数量110个。

[root@k8s-master1 ~]# kubectl describe node hollownode174 | grep pods
  pods:               110

解决:

配置external集群,设置节点的pods最大数量为500,以其中一个节点hollownode174为例:

[root@hollownode174 ~]#  systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sun 2024-08-18 17:28:27 UTC; 21min ago
     Docs: https://kubernetes.io/docs/
 Main PID: 1213 (kubelet)
    Tasks: 25
   Memory: 196.1M
   CGroup: /system.slice/kubelet.service
           └─1213 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-...
...
# 修改kubelet配置文件
[root@hollownode174 ~]# vi /var/lib/kubelet/config.yaml

# 最后一行增加如下配置
maxPods: 500

# 重启kubelet服务生效
systemctl restart kubelet

# 确认pods数量
[root@k8s-master1 ~]# kubectl describe node hollownode174 | grep pods
  pods:               500
  pods:               500

4.2 external集群,pod卡在pod初始化阶段(Init:0/1 ),查看日志提示pod所在节点ip地址已经分配完毕。

  Warning  FailedCreatePodSandBox  2m2s               kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "dc19ee36059688135f684121995c1aaf7c69bb3b9e215aef6f3446f7ed78c925": plugin type="flannel" failed (add): failed to delegate add: failed to allocate for range 0: no IP addresses available in range set: 210.244.7.1-210.244.7.254
  Warning  FailedCreatePodSandBox  83s                kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b0324f5d8d8a4518b5258c9db6a1a32e48da9e7e535fa7c4a27f65b2c75731ef": plugin type="flannel" failed (add): failed to delegate add: failed to allocate for range 0: no IP addresses available in range set: 210.244.7.1-210.244.7.254
  Warning  FailedCreatePodSandBox  50s                kubelet            (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0def0ce3e2f9c0e54b919e138816d1a24442fd82d2cd621d371b58fc5b3dbc4e": plugin type="flannel" failed (add): failed to delegate add: failed to allocate for range 0: no IP addresses available in range set: 210.244.7.1-210.244.7.254

解决:

  1. 接近5000个节点时(4700+),发现大量的pod(200+)集中到node179节点上,超过250个,导致pod无法分配到ip地址,出现上述问题,删除异常的重建让pod重新调度到其他节点。
# flannel为节点分配的子网信息,最多支持创建253个pod
[root@hollownode179 ~]# cat /var/run/flannel/subnet.env
FLANNEL_NETWORK=210.244.0.0/16
FLANNEL_SUBNET=210.244.7.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

kubectl get pod -n kubemark -o wide | grep hollownode179 | grep -v NAME | grep -v Run | awk '{print $1}' | xargs kubectl delete pod -n kubemark {}

除了上述方法之外,还有如下思路:

  1. 增加node节点数量。

  2. 探讨修改网络插件节点子网分配的机制,设置可分配子网ip数量大于一个C段地址数量。

4.3 pod子网和kubemark集群网冲突

hollow-node pod子网如果与kubemark集群的网络冲突,按需更改网络插件的子网规划。我这里使用的flannel插件,子网改为不常用的网段,避免冲突:

[root@hollownode193 ~]# cat /var/run/flannel/subnet.env
FLANNEL_NETWORK=210.244.0.0/16
FLANNEL_SUBNET=210.244.18.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true

4.4 关于组网相关问题

在本文测试环境中,被测试k8s集群(kubemark集群)的网络包括管理网(10.210.0.0/16)和集群网(192.168.100.0/24), apiserver监听地址为192.168.100.0/24网段。external集群搭建时网段只用了管理网(10.210.0.0/16),这样导致了external集群pod无法和kubemark集群通信。

解决:

external集群增加第二网卡,配置集群网和kubemark集群打通:

# external集群节点增加集群网络
[root@hollownode192 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:da:41:1d:83:3c brd ff:ff:ff:ff:ff:ff
    inet 10.210.10.193/16 brd 10.210.255.255 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::eda:41ff:fe1d:833c/64 scope link
       valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    link/ether 0c:da:41:1d:d1:0c brd ff:ff:ff:ff:ff:ff
    inet 192.168.101.193/24 brd 192.168.101.255 scope global eth1
       valid_lft forever preferred_lft forever
    inet6 fe80::eda:41ff:fe1d:d10c/64 scope link
       valid_lft forever preferred_lft forever

# pod内测试到kubemark集群的网络通信和解析正常
[root@k8s-master1 ~]# kubectl exec -itn kubemark hollow-node-7f499b849f-zrqfs sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "hollow-kubelet" out of: hollow-kubelet, hollow-proxy, init-inotify-limit (init)

sh-4.2# ping apiserver.cluster.local
PING apiserver.cluster.local (192.168.101.54) 56(84) bytes of data.
64 bytes from apiserver.cluster.local (192.168.101.54): icmp_seq=1 ttl=63 time=0.550 ms

5. 相关资料

  1. https://blog.csdn.net/codelearning/article/details/140254088

  2. https://blog.csdn.net/codelearning/article/details/139933346

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐