一. 常用错误发现手段

我们在部署经常看到的提示是:

[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.

这些提示尚不能给与明确的问题细节,常用下列手段进一步发现问题:

  • 服务状态
systemctl status kubelet 
  • 时间日志
journalctl -xeu kubelet
  • 执行jion或者init的时候添加 --v=5
  • 查询系统日志/var/log/messages,例如
tail /var/log/messages

二、错误问题

1. token 过期

问题:
日志发现token过期提示,例如:
The cluster-info ConfigMap does not yet contain a JWS signature for token ID “42mf2r”, will try again

解决方案:

  1. 在master机器上执行查询token信息
kubeadm token list
  1. 如果查询不到信息,锁门token过期了,需要重新生成token
kubeadm token create

查看discovery-token-ca-cert-hash

openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed 's/^.* //'
  1. 生成后可以替换kubeadm join语句中的token部分
    kubeadm join 192.168.0.10:6443 --token vpm4o3.p4bn3co35bplw77m
    –discovery-token-ca-cert-hash sha256:23d0fe16f3e825ef81d9682d3dc3a706fdad1712d4213ad4243a7d6d7abd5c36

2. 时间同步问题

问题:
报错:Failed to request cluster-info, will try again: certificate has expired or is not yet valid
解决方案:

ntpdate ntp.aliyun.com

3. docker Cgroup Driver 不是systemd

问题:
server.go:302] “Failed to run kubelet” err="failed to run Kubelet: misconfiguration: kubelet cgroup driver: “systemd” is different …

解决方案:

  1. 修改docker服务的配置文件,“/etc/docker/daemon.json ”文件,添加如下
{
"exec-opts": ["native.cgroupdriver=systemd"]
}

重启dokcer服务:

sudo systemctl daemon-reload
sudo systemctl restart docker

修改后查看docker的 cgroup

 docker info |grep "Cgroup Driver"

Cgroup Driver: systemd #已经更新为了systemd

重启kuberlet:

systemctl restart kubelet

4. Failed to create cgroup(未验证)

问题:
“Failed to create cgroup” err="Cannot set property TasksAccountin
解决方案:

yum update systemd

子节点误执行kubeadm reset

问题:
子节点误执行kubeadm reset
解决方案:

  1. 删除/etc/kubernetes/下所有文件
  2. kubeadm reset
  3. 重新join
Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐