记录一次k8s节点故障的解决记录
kube-flannel-ds显示ImagePullBackOffk8s node节点加入异常
·
现象1:一直有一个节点未准备
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 60m v1.17.0
node1 NotReady <none> 30m v1.17.0
node2 Ready <none> 29m v1.17.0
现象2:有一个flannel显示ImagePullBackOff
[root@master ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-9d85f5447-r2dx6 1/1 Running 0 64m
coredns-9d85f5447-zskjc 1/1 Running 0 64m
etcd-master 1/1 Running 0 64m
kube-apiserver-master 1/1 Running 0 64m
kube-controller-manager-master 1/1 Running 0 64m
kube-flannel-ds-7bknh 1/1 Running 0 33m
kube-flannel-ds-9xwsr 0/1 Init:ImagePullBackOff 1 35m
kube-flannel-ds-tspl2 1/1 Running 0 44m
kube-proxy-ggd7p 1/1 Running 1 35m
kube-proxy-m8ljk 1/1 Running 0 64m
kube-proxy-xrt7c 1/1 Running 0 33m
kube-scheduler-master 1/1 Running 0 64m
现象3:查看kube-flannel-ds-9xwsr 发现是pull镜像超时
[root@master ~]# kubectl describe pod -n kube-system kube-flannel-ds-9xwsr
Name: kube-flannel-ds-9xwsr
Namespace: kube-system
Priority: 2000001000 . .
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 46m default-scheduler Successfully assigned kube-system/kube-flannel-ds-9xwsr to node1
Normal Pulling 46m kubelet, node1 Pulling image "rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.0"
Normal Pulled 46m kubelet, node1 Successfully pulled image "rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.0"
Normal Created 46m kubelet, node1 Created container install-cni-plugin Normal Started 46m kubelet, node1 Started container install-cni-plugin
Normal Pulling 46m kubelet, node1 Pulling image "rancher/mirrored-flannelcni-flannel:v0.16.1"
Normal SandboxChanged 17m kubelet, node1 Pod sandbox changed, it will be killed and re-created.
Normal Started 17m kubelet, node1 Started container install-cni-plugin Normal Created 17m kubelet, node1 Created container install-cni-plugin
Normal Pulled 17m kubelet, node1 Container image "rancher/mirrored-flannelcni-flannel-cni-plugin:v1.0.0" already present on machine
Normal Pulling 10m (x4 over 17m) kubelet, node1 Pulling image "rancher/mirrored-flannelcni-flannel:v0.16.1"
Warning Failed 9m23s (x4 over 15m) kubelet, node1 Error: ErrImagePull
Warning Failed 9m11s (x5 over 15m) kubelet, node1 Error: ImagePullBackOff
Warning Failed 6m24s (x5 over 15m) kubelet, node1 Failed to pull image "rancher/mirrored-flannelcni-flannel:v0.16.1": rpc error: code = Unknown desc = context canceled
Normal BackOff 112s (x23 over 15m) kubelet, node1 Back-off pulling image "rancher/mirrored-flannelcni-flannel:v0.16.1"
解决过程:
在故障节点尝试的操作:
1、重启故障节点(未能解决)
2、尝试启动停止的容器,发现启动不了
3、重启daemon和docker,未能解决
4、停止运行中的容器,并删除未启动的容器,故障解决
具体操作过程如下所示
#查看容器运行状态(此处是为了方便和重启后做对比,此处有6个容器)
[root@node1 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b8a649455c1c 7d54289267dc "/usr/local/bin/kube…" 2 minutes ago Up 2 minutes k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2
46ba6b578ebf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 2 minutes ago Up 2 minutes k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2
12326131af76 cd5235cd7dc2 "cp -f /flannel /opt…" 9 minutes ago Exited (0) 2 minutes ago k8s_install-cni-plugin_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1
2f7b7ceec68e 7d54289267dc "/usr/local/bin/kube…" 10 minutes ago Exited (2) 2 minutes ago k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_1
268dc494222a registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 minutes ago Up 9 minutes k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1
b1ac093e353f registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 minutes ago Exited (0) 2 minutes ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907
#重新加载daemon 和重启docker服务
#重新加载daemon
[root@node1 ~]# systemctl daemon-reload
#重启docker服务 [root@node1 ~]# systemctl restart docker
#再次查看容器运行状态,发现多了2个Exited状态的容器和1个Created状态的容器总计有9个容器
[root@node1 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6a742f121b59 cd5235cd7dc2 "cp -f /flannel /opt…" 10 seconds ago Exited (0) 9 seconds ago k8s_install-cni-plugin_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_0
36e69e7a3dcd 7d54289267dc "/usr/local/bin/kube…" 10 seconds ago Up 9 seconds k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3
95b0547dbf88 registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 seconds ago Up 10 seconds k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3
cd90ee8a56cf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 10 seconds ago Up 9 seconds k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_3
b1237af848cf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 11 seconds ago Created k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_2
b8a649455c1c 7d54289267dc "/usr/local/bin/kube…" 16 minutes ago Exited (2) 11 seconds ago k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2
46ba6b578ebf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 16 minutes ago Exited (0) 11 seconds ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2
268dc494222a registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 23 minutes ago Exited (0) 11 seconds ago k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1
b1ac093e353f registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 23 minutes ago Exited (0) 16 minutes ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d690
#过了一段时间之后发现,刚刚新增的3个容器消失了,有变回了6个
[root@node1 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
6a742f121b59 cd5235cd7dc2 "cp -f /flannel /opt…" 50 seconds ago Exited (0) 49 seconds ago k8s_install-cni-plugin_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_0
36e69e7a3dcd 7d54289267dc "/usr/local/bin/kube…" 50 seconds ago Up 49 seconds k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3
95b0547dbf88 registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 50 seconds ago Up 49 seconds k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_3
cd90ee8a56cf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 50 seconds ago Up 49 seconds k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_3
b8a649455c1c 7d54289267dc "/usr/local/bin/kube…" 16 minutes ago Exited (2) 51 seconds ago k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_2
46ba6b578ebf registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 16 minutes ago Exited (0) 51 seconds ago k8s_POD_kube-proxy-ggd7p_kube-system_1a7d690
##停止运行中的容器
[root@node1 ~]# docker stop 36e69e7a3dcd 95b0547dbf88 cd90ee8a56cf
36e69e7a3dcd
95b0547dbf88
cd90ee8a56cf
#删除所有容器,发现有4个容器无法删除提示在运行
[root@node1 ~]# docker container rm $(docker ps -qa)
11aaee411eb8
e2efad2fb393
a7ad16a4f86e
36e69e7a3dcd
95b0547dbf88
cd90ee8a56cf
Error response from daemon: You cannot remove a running container 79d354753c5d143c1b2bd95d1aa52ca48fa861e530da967c86b6537e52895647. Stop the container before attempting removal or force remove
Error response from daemon: You cannot remove a running container 74f74b6b43e5dc02110aec24d38489412c4833e18e4ee860d8019bfcdae4aad8. Stop the container before attempting removal or force remove
Error response from daemon: You cannot remove a running container 30f4881c1809d107a2bf717c45a780a9b04ae8cc9f40278723a74686ef3f72f2. Stop the container before attempting removal or force remove
Error response from daemon: You cannot remove a running container 9a82cbc9374c73288bc0e3bc8205c3c1d298b4409483c38a8d2edcf7682100ec. Stop the container before attempting removal or force remove
#再次查看,发现确实有4个容器在运行,而且是全新运行的容器
[root@node1 ~]# docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
79d354753c5d 7d54289267dc "/usr/local/bin/kube…" 26 seconds ago Up 26 seconds k8s_kube-proxy_kube-proxy-ggd7p_kube-system_1a7d6907-415b-40cf-ae6a-9297ffad8e3e_4
74f74b6b43e5 404fc3ab6749 "/opt/bin/flanneld -…" 39 seconds ago Up 38 seconds k8s_kube-flannel_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_1
30f4881c1809 registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 41 seconds ago Up 40 seconds k8s_POD_kube-flannel-ds-9xwsr_kube-system_45e1f94a-28e2-4f8b-89bc-a0cec89f3352_4
9a82cbc9374c registry.aliyuncs.com/google_containers/pause:3.1 "/pause" 42 seconds ago Up 41 seconds k8s_POD_kube-proxy-ggd7p_kube-system_1a7d6907-415b-4
#回到Master节点查看,发现故障节点已经恢复,且准备完毕
[root@master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 96m v1.17.0
node1 Ready <none> 67m v1.17.0
node2 Ready <none> 65m v1.17.0
#kube-flannel-ds-9xwsr 故障的flannel也恢复了
[root@master ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-9d85f5447-r2dx6 1/1 Running 0 96m
coredns-9d85f5447-zskjc 1/1 Running 0 96m
etcd-master 1/1 Running 0 96m
kube-apiserver-master 1/1 Running 0 96m
kube-controller-manager-master 1/1 Running 0 96m
kube-flannel-ds-7bknh 1/1 Running 0 65m
kube-flannel-ds-9xwsr 1/1 Running 1 67m
kube-flannel-ds-tspl2 1/1 Running 0 76m
kube-proxy-ggd7p 1/1 Running 4 67m
kube-proxy-m8ljk 1/1 Running 0 96m
kube-proxy-xrt7c 1/1 Running 0 65m
kube-scheduler-master 1/1 Running 0 96m
更多推荐
已为社区贡献3条内容
所有评论(0)