【报错记录】部署k8s集群可能会遇到的问题
排错过程:kubelet状态有异常定位原因:日志表明:kubelet的cgroup driver是cgroupfs,docker的 cgroup driver是systemd,两者不一致导致kubelet启动失败解决办法:1、尝试过修改kubelet的cgroup dirver(文件位置:/etc/systemd/system/kubelet.service.d/10-kubeadm.conf),
部署k8s集群可能会遇到的问题
1. Master节点 kubeadm init 失败报错
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
排错过程:
[root@k8s-master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since 二 2023-03-07 10:23:17 CST; 6s ago
Docs: https://kubernetes.io/docs/
Process: 8321 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/FAILURE)
Main PID: 8321 (code=exited, status=1/FAILURE)
3月 07 10:23:17 k8s-master systemd[1]: Unit kubelet.service entered failed ...e.
3月 07 10:23:17 k8s-master systemd[1]: kubelet.service failed.
Hint: Some lines were ellipsized, use -l to show in full.
kubelet状态有异常
[root@k8s-master ~]# journalctl -xeu kubelet
3月 07 10:23:48 k8s-master kubelet[8512]: E0307 10:23:48.712592 8512 server.go
3月 07 10:23:48 k8s-master systemd[1]: kubelet.service: main process exited, code
3月 07 10:23:48 k8s-master systemd[1]: Unit kubelet.service entered failed state.
3月 07 10:23:48 k8s-master systemd[1]: kubelet.service failed.
定位原因:
[root@k8s-master ~]# tail /var/log/messages
Mar 7 10:25:31 k8s-master kubelet: E0307 10:25:31.123616 9153 server.go:302] "Failed to run kubelet" err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: \"systemd\" is different from docker cgroup driver: \"cgroupfs\""
Mar 7 10:25:31 k8s-master systemd: kubelet.service: main process exited, code=exited, status=1/FAILURE
Mar 7 10:25:31 k8s-master systemd: Unit kubelet.service entered failed state.
Mar 7 10:25:31 k8s-master systemd: kubelet.service failed.
日志表明:kubelet的cgroup driver是cgroupfs,docker的 cgroup driver是systemd,两者不一致导致kubelet启动失败
解决办法:
1、尝试过修改kubelet的cgroup dirver(文件位置:/etc/systemd/system/kubelet.service.d/10-kubeadm.conf),但是每次启动minikube时会被覆盖掉,于是只能放弃这种处理方式,转去修改docker的cgroup dirver设置;
2、打开文件
/usr/lib/systemd/system/docker.service
如下图,将红框中的
–exec-opt native.cgroupdriver=systemd
添加到execStart 里面:
重新加载配置信息,重启docker服务:
systemctl daemon-reload && systemctl restart docker
成功
2.节点初始化集群或加入集群出错
2.1 Node加入集群出错
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
a :检查环境准备
# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
# 关闭selinux
sed -i ‘s/enforcing/disabled/’ /etc/selinux/config # 永久
setenforce 0 # 临时
# 关闭swap
swapoff -a # 临时
sed -ri ‘s/.swap./#&/’ /etc/fstab # 永久
2.2 新的报错
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[ERROR Port-10250]: Port 10250 is in use
出现上述报错,端口被占用,执行以下
kubeadm reset
重置配置
[ERROR FileAvailable–etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
已有一份配置文件存在,属于之前的提交的失败任务,改名即可,或删除。
mv /etc/kubernetes/kubelet.conf /etc/kubernetes/kubelet.conf.bak
mv /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/ca.crt.bak
3.下载yaml失败
[root@k8s-master ~]# wget https://docs.projectcalico.org/v3.14/manifests/calico.yaml --no-check-certificate
......
错误: 无法验证 docs.projectcalico.org 的由 “/C=US/O=Let's Encrypt/CN=R3” 颁发的证书:
颁发的证书已经过期。
解决:
wget https://docs.projectcalico.org/v3.14/manifests/calico.yaml --no-check-certificate
更多推荐
所有评论(0)