kubemark模拟5000节点实测记录
记录使用kubemark工具模拟5000个k8s节点。
kubemark模拟5000街道实测记录
记录使用kubemark工具模拟5000个k8s节点。
1. 环境准备
kubemark集群(容器一体机环境)计划模拟5000个k8s节点,在external集群准备20个节点用于创建模拟hollow-node的pod资源。
关于节点数量的计算,主要从以下考虑:
-
每个pod资源需求资源:官方给出的建议是每个pod 0.1 CPU核心和220MB内存。从实际测试来看,这个资源需求可以更小一些,大约0.5倍资源量。
-
每个节点初始化时配置一个C段IP地址(除了一些ds的pod,每个节点安装250个hollow pod进行规划)
-
设置节点可以调度的pods数量500个(理论上大于可以分配的IP地址253个就行)
准备一个与kubemark集群(被测试集群)相同版本的external k8s集群,部署kubemark pod用于模拟hollow node:
# kubemark集群。由三台物理服务器组成,配置较高
[root@cluster54 ~]# kubectl get node -o wide
cluster54 Ready control-plane,master 12d v1.27.6 192.168.101.54 <none> openEuler 22.03 (LTS-SP1) 5.10.0-136.12.0.86.4.hl202.x86_64 containerd://1.7.7-u2
cluster55 Ready control-plane,master 12d v1.27.6 192.168.101.55 <none> openEuler 22.03 (LTS-SP1) 5.10.0-136.12.0.86.4.hl202.x86_64 containerd://1.6.14
cluster56 Ready control-plane,master 12d v1.27.6 192.168.101.56 <none> openEuler 22.03 (LTS-SP1) 5.10.0-136.12.0.86.4.hl202.x86_64 containerd://1.6.14
# 集群信息
[root@cluster54 ~]# kubectl cluster-info
Kubernetes control plane is running at https://apiserver.cluster.local:6443
CoreDNS is running at https://apiserver.cluster.local:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
# hosts解析
[root@cluster54 ~]# cat /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
10.210.10.54 cluster54
10.210.10.55 cluster55
10.210.10.56 cluster56
192.168.101.54 cluster54
192.168.101.55 cluster55
192.168.101.56 cluster56
192.168.101.200 vip.cluster.local
192.168.101.54 apiserver.cluster.local
192.168.101.200 vip.harbor.cloudos
# external集群信息,有四台16C32G的虚拟机组成
[root@k8s-master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master1 Ready control-plane 60d v1.27.6
k8s-node1 Ready <none> 60d v1.27.6
k8s-node2 Ready <none> 60d v1.27.6
k8s-node3 Ready <none> 56d v1.27.6
在上述k8s集群,扩容20个节点(16C32G)用于模拟hollow-node:
# 原有节点设置停止调度
[root@k8s-master1 ~]# kubectl cordon k8s-node1
[root@k8s-master1 ~]# kubectl cordon k8s-node2
[root@k8s-master1 ~]# kubectl cordon k8s-node3
[root@k8s-master1 ~]# kubectl cordon k8s-master1
# 扩容20个hollownode节点
[root@k8s-master1 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
hollownode174 Ready <none> 4d v1.27.6
hollownode175 Ready <none> 4d v1.27.6
hollownode176 Ready <none> 4d v1.27.6
hollownode177 Ready <none> 4d v1.27.6
hollownode178 Ready <none> 4d v1.27.6
hollownode179 Ready <none> 4d v1.27.6
hollownode180 Ready <none> 4d v1.27.6
hollownode181 Ready <none> 4d v1.27.6
hollownode182 Ready <none> 4d v1.27.6
hollownode183 Ready <none> 4d v1.27.6
hollownode184 Ready <none> 4d v1.27.6
hollownode185 Ready <none> 4d v1.27.6
hollownode186 Ready <none> 4d v1.27.6
hollownode187 Ready <none> 4d v1.27.6
hollownode188 Ready <none> 4d v1.27.6
hollownode189 Ready <none> 4d v1.27.6
hollownode190 Ready <none> 4d v1.27.6
hollownode191 Ready <none> 4d v1.27.6
hollownode192 Ready <none> 4d v1.27.6
hollownode193 Ready <none> 4d v1.27.6
k8s-master1 Ready,SchedulingDisabled control-plane 60d v1.27.6
k8s-node1 Ready,SchedulingDisabled <none> 60d v1.27.6
k8s-node2 Ready,SchedulingDisabled <none> 60d v1.27.6
k8s-node3 Ready,SchedulingDisabled <none> 56d v1.27.6
# 批量设置label
[root@k8s-master1 ~]# for i in {174..193}; do kubectl label node hollownode$i name=hollow-node; done
2. 创建hollow-node pod
通过在external集群创建pod,pod中会有kubelet通过kubeconfig文件注册到kubemark集群,从而完成节点的模拟。
2.1 external集群
- 创建命名空间
[root@k8s-master ~]# kubectl create ns kubemark
- 创建configmap和secret
# 在 external cluster创建configmap
[root@k8s-master ~]# kubectl create configmap node-configmap -n kubemark --from-literal=content.type="test-cluster"
# 在 external cluster 上创建secret,其中kubeconfig为kubemark集群(被测试集群)的kubeconfig文件
[root@k8s-master ~]# kubectl create secret generic kubeconfig --type=Opaque --namespace=kubemark --from-file=kubelet.kubeconfig=kubeconfig.kubemark --from-file=kubeproxy.kubeconfig=kubeconfig.kubemark
- 创建hollow-node pod
准备下面的yaml文件hollow-node.yaml
:
apiVersion: apps/v1
kind: Deployment
metadata:
name: hollow-node
namespace: kubemark
labels:
name: hollow-node
spec:
replicas: 3
selector:
matchLabels:
name: hollow-node
template:
metadata:
labels:
name: hollow-node
spec:
hostAliases:
- ip: "192.168.101.54" # 如果是高可用,则填写集群域名对应的IP地址
hostnames:
- "apiserver.cluster.local" # kubemark集群域名
nodeSelector:
name: hollow-node
initContainers:
- name: init-inotify-limit
image: busybox
imagePullPolicy: IfNotPresent
command: ['sysctl', '-w', 'fs.inotify.max_user_instances=524288']
securityContext:
privileged: true
volumes:
- name: kubeconfig-volume
secret:
secretName: kubeconfig
- name: containerd
hostPath:
path: /run/containerd
- name: logs-volume
hostPath:
path: /var/log
containers:
- name: hollow-kubelet
image: staging-k8s.gcr.io/kubemark:v1.27.6
imagePullPolicy: IfNotPresent
ports:
- containerPort: 4194
- containerPort: 10250
- containerPort: 10255
env:
- name: CONTENT_TYPE
valueFrom:
configMapKeyRef:
name: node-configmap
key: content.type
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command:
- /bin/sh
- -c
- /kubemark --morph=kubelet --name=$(NODE_NAME) --kubeconfig=/kubeconfig/kubelet.kubeconfig --v=2
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: logs-volume
mountPath: /var/log
- name: containerd
mountPath: /run/containerd
securityContext:
privileged: true
- name: hollow-proxy
image: staging-k8s.gcr.io/kubemark:v1.27.6
imagePullPolicy: IfNotPresent
env:
- name: CONTENT_TYPE
valueFrom:
configMapKeyRef:
name: node-configmap
key: content.type
- name: NODE_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
command:
- /bin/sh
- -c
- /kubemark --morph=proxy --name=$(NODE_NAME) --use-real-proxier=false --kubeconfig=/kubeconfig/kubeproxy.kubeconfig --v=2
volumeMounts:
- name: kubeconfig-volume
mountPath: /kubeconfig
readOnly: true
- name: logs-volume
mountPath: /var/log
- name: containerd
mountPath: /run/containerd
tolerations:
- key: key
value: value
effect: NoSchedule
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: name
operator: In
values:
- hollow-node
创建deploy:kubectl create -f hollow-node.yaml
,创建后确认kubemark集群侧已经有3个hollow-node节点注册:
[root@cluster54 ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
cluster54 Ready control-plane,master 12d v1.27.6
cluster55 Ready control-plane,master 12d v1.27.6
cluster56 Ready control-plane,master 12d v1.27.6
hollow-node-7f499b849f-222rm Ready <none> 15h v1.27.6
hollow-node-7f499b849f-2247k Ready <none> 15h v1.27.6
hollow-node-7f499b849f-2264p Ready <none> 15h v1.27.6
如果此时没有节点注册上来,要排查问题。解决后再继续下面的步骤。
- 扩容
等待注册流程没问题后,修改副本扩容,我这里直接扩容为5000。当然可以从1000 --> 3000 --> 5000逐步进行副本数量。
kubectl scale -n kubemark deployment.apps/hollow-node --replicas=5000
资源使用情况:
[root@k8s-master1 ~]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
hollownode174 4210m 26% 19804Mi 62%
hollownode175 4432m 27% 19800Mi 62%
hollownode176 4455m 27% 19757Mi 61%
hollownode177 4137m 25% 19712Mi 61%
hollownode178 4185m 26% 19836Mi 62%
hollownode179 3901m 24% 20661Mi 64%
hollownode180 4218m 26% 19757Mi 61%
hollownode181 3653m 22% 19915Mi 61%
hollownode182 3636m 22% 19957Mi 62%
hollownode183 4152m 25% 19796Mi 62%
hollownode184 3620m 22% 19864Mi 61%
hollownode185 4237m 26% 19796Mi 62%
hollownode186 4288m 26% 19827Mi 62%
hollownode187 4321m 27% 19798Mi 62%
hollownode188 4230m 26% 19801Mi 62%
hollownode189 4456m 27% 19690Mi 61%
hollownode190 4308m 26% 19830Mi 62%
hollownode191 4415m 27% 19923Mi 62%
hollownode192 4413m 27% 19775Mi 62%
hollownode193 4083m 25% 19797Mi 62%
k8s-master1 310m 3% 4446Mi 13%
k8s-node1 85m 0% 1449Mi 4%
k8s-node2 69m 0% 1421Mi 4
2.2 kubemark集群
kubemark集群观察节点增长:watch -n 2 "kubectl get node | grep hollow-node | wc -l"
确认5000个节点完成注册:
[root@cluster54 ~]# kubectl get node | grep hollow-node | wc -l
5000
除此之外,kubemark集群会创建很多余hollow-node节点相关的pod:
[root@cluster54 ~]# kubectl get pod -A -o wide | grep hollow-node-7f499b849f-zzr8j
kube-system kube-ovn-cni-6kblg 0/1 Init:ErrImagePull 0 16h 192.168.192.168 hollow-node-7f499b849f-zzr8j <none> <none>
kube-system kube-ovn-pinger-nsf4r 0/1 ErrImagePull 0 16h 192.168.192.168 hollow-node-7f499b849f-zzr8j <none> <none>
kube-system kube-proxy-clj7h 0/1 ErrImagePull 0 16h 192.168.192.168 hollow-node-7f499b849f-zzr8j <none> <none>
kube-system ovs-ovn-59t7r 0/1 ErrImagePull 0 16h 192.168.192.168 hollow-node-7f499b849f-zzr8j <none> <none>
rook-ceph csi-cephfsplugin-nhwp7 1/2 ErrImagePull 0 16h 192.168.192.168 hollow-node-7f499b849f-zzr8j <none> <none>
rook-ceph csi-rbdplugin-n22jx 1/2 ErrImagePull 0 16h 192.168.192.168 hollow-node-7f499b849f-zzr8j <none> <none>
# kubemark集群总的pod数量
[root@cluster54 ~]# kubectl get pod -A | wc -l
60931
2.3 资源占用情况
等待5000个节点模拟完成后,查看环境的资源占用情况:
20个节点的资源占用情况
[root@k8s-master1 ~]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
hollownode174 4282m 26% 22146Mi 69%
hollownode175 4422m 27% 22152Mi 69%
hollownode176 4329m 27% 22120Mi 69%
hollownode177 4377m 27% 22145Mi 69%
hollownode178 4274m 26% 22188Mi 69%
hollownode179 3790m 23% 21987Mi 68%
hollownode180 4298m 26% 22174Mi 69%
hollownode181 3651m 22% 22204Mi 69%
hollownode182 3754m 23% 22214Mi 69%
hollownode183 4201m 26% 22172Mi 69%
hollownode184 3690m 23% 22154Mi 68%
hollownode185 4375m 27% 22155Mi 69%
hollownode186 4402m 27% 22211Mi 69%
hollownode187 4389m 27% 22149Mi 69%
hollownode188 4258m 26% 22161Mi 69%
hollownode189 4435m 27% 22087Mi 69%
hollownode190 4492m 28% 22158Mi 69%
hollownode191 4508m 28% 22209Mi 69%
hollownode192 4406m 27% 22117Mi 69%
hollownode193 4257m 26% 22205Mi 69%
虚拟化平台侧监控情况:
3. 清理资源
测试完成后清理测试资源
3.1 kubemark集群
- kubemark集群清理hollow-node节点:
kubectl get node | grep hollow-node | awk '{print $1}' | xargs kubectl delete node {}
3.2 external集群
- 删除hollow-node pod
external集群,删除pod: kubectl scale -n kubemark deployment.apps/hollow-node --replicas=0
只是临时删除pod,后续可以进行扩容再次测试。
- 彻底删除
删除deploy:kubectl delete -f hollow-node.yaml
删除configmap:kubectl delete configmap node-configmap -n kubemark
删除secret:kubectl delete secret kubeconfig -n kubemark
删除ns:kubectl create ns kubemark
4. 常见问题
4.1 external集群,pod调度问题,大量pod出于pending状态。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 3m1s default-scheduler 0/24 nodes are available: 20 Too many pods, 4 node(s) were unschedulable. preemption: 0/24 nodes are available: 20 No preemption victims found for incoming pod, 4 Preemption is not helpful for schedul
原因:pod数量超过节点设置的最大pods数量110个。
[root@k8s-master1 ~]# kubectl describe node hollownode174 | grep pods
pods: 110
解决:
配置external集群,设置节点的pods最大数量为500,以其中一个节点hollownode174为例:
[root@hollownode174 ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sun 2024-08-18 17:28:27 UTC; 21min ago
Docs: https://kubernetes.io/docs/
Main PID: 1213 (kubelet)
Tasks: 25
Memory: 196.1M
CGroup: /system.slice/kubelet.service
└─1213 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-...
...
# 修改kubelet配置文件
[root@hollownode174 ~]# vi /var/lib/kubelet/config.yaml
# 最后一行增加如下配置
maxPods: 500
# 重启kubelet服务生效
systemctl restart kubelet
# 确认pods数量
[root@k8s-master1 ~]# kubectl describe node hollownode174 | grep pods
pods: 500
pods: 500
4.2 external集群,pod卡在pod初始化阶段(Init:0/1 ),查看日志提示pod所在节点ip地址已经分配完毕。
Warning FailedCreatePodSandBox 2m2s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "dc19ee36059688135f684121995c1aaf7c69bb3b9e215aef6f3446f7ed78c925": plugin type="flannel" failed (add): failed to delegate add: failed to allocate for range 0: no IP addresses available in range set: 210.244.7.1-210.244.7.254
Warning FailedCreatePodSandBox 83s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b0324f5d8d8a4518b5258c9db6a1a32e48da9e7e535fa7c4a27f65b2c75731ef": plugin type="flannel" failed (add): failed to delegate add: failed to allocate for range 0: no IP addresses available in range set: 210.244.7.1-210.244.7.254
Warning FailedCreatePodSandBox 50s kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "0def0ce3e2f9c0e54b919e138816d1a24442fd82d2cd621d371b58fc5b3dbc4e": plugin type="flannel" failed (add): failed to delegate add: failed to allocate for range 0: no IP addresses available in range set: 210.244.7.1-210.244.7.254
解决:
- 接近5000个节点时(4700+),发现大量的pod(200+)集中到node179节点上,超过250个,导致pod无法分配到ip地址,出现上述问题,删除异常的重建让pod重新调度到其他节点。
# flannel为节点分配的子网信息,最多支持创建253个pod
[root@hollownode179 ~]# cat /var/run/flannel/subnet.env
FLANNEL_NETWORK=210.244.0.0/16
FLANNEL_SUBNET=210.244.7.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
kubectl get pod -n kubemark -o wide | grep hollownode179 | grep -v NAME | grep -v Run | awk '{print $1}' | xargs kubectl delete pod -n kubemark {}
除了上述方法之外,还有如下思路:
-
增加node节点数量。
-
探讨修改网络插件节点子网分配的机制,设置可分配子网ip数量大于一个C段地址数量。
4.3 pod子网和kubemark集群网冲突
hollow-node pod子网如果与kubemark集群的网络冲突,按需更改网络插件的子网规划。我这里使用的flannel插件,子网改为不常用的网段,避免冲突:
[root@hollownode193 ~]# cat /var/run/flannel/subnet.env
FLANNEL_NETWORK=210.244.0.0/16
FLANNEL_SUBNET=210.244.18.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
4.4 关于组网相关问题
在本文测试环境中,被测试k8s集群(kubemark集群)的网络包括管理网(10.210.0.0/16)和集群网(192.168.100.0/24), apiserver监听地址为192.168.100.0/24网段。external集群搭建时网段只用了管理网(10.210.0.0/16),这样导致了external集群pod无法和kubemark集群通信。
解决:
external集群增加第二网卡,配置集群网和kubemark集群打通:
# external集群节点增加集群网络
[root@hollownode192 ~]# ip add
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 0c:da:41:1d:83:3c brd ff:ff:ff:ff:ff:ff
inet 10.210.10.193/16 brd 10.210.255.255 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::eda:41ff:fe1d:833c/64 scope link
valid_lft forever preferred_lft forever
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 0c:da:41:1d:d1:0c brd ff:ff:ff:ff:ff:ff
inet 192.168.101.193/24 brd 192.168.101.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::eda:41ff:fe1d:d10c/64 scope link
valid_lft forever preferred_lft forever
# pod内测试到kubemark集群的网络通信和解析正常
[root@k8s-master1 ~]# kubectl exec -itn kubemark hollow-node-7f499b849f-zrqfs sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Defaulted container "hollow-kubelet" out of: hollow-kubelet, hollow-proxy, init-inotify-limit (init)
sh-4.2# ping apiserver.cluster.local
PING apiserver.cluster.local (192.168.101.54) 56(84) bytes of data.
64 bytes from apiserver.cluster.local (192.168.101.54): icmp_seq=1 ttl=63 time=0.550 ms
5. 相关资料
更多推荐
所有评论(0)