一次k8s本地虚拟机挂起导致的etcd数据文件损坏
k8s问题排查
·
划重点:journalctl -xefu kubelet命令可以查看kubelet的运行日志。
journalctl -xefu kubelet
kubelet异常,所以查看kubelet状态systemctl status kubelet -l
https://www.cnblogs.com/leoshi/p/16581687.html
https://blog.csdn.net/qq_29274865/article/details/116016449
docker logs --tail=1000 k8s_etcd_etcd-k8s-node1_kube-system_ba23057e939a3b1a7f65672a8f39bf66_1162
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2022-08-20 07:16:22.522407 I | etcdmain: etcd Version: 3.4.3
2022-08-20 07:16:22.522441 I | etcdmain: Git SHA: 3cf2f69b5
2022-08-20 07:16:22.522464 I | etcdmain: Go Version: go1.12.12
2022-08-20 07:16:22.522466 I | etcdmain: Go OS/Arch: linux/amd64
2022-08-20 07:16:22.522469 I | etcdmain: setting maximum number of CPUs to 6, total number of available CPUs is 6
2022-08-20 07:16:22.522639 N | etcdmain: the server is already initialized as member before, starting as etcd member...
[WARNING] Deprecated '--logger=capnslog' flag is set; use '--logger=zap' flag instead
2022-08-20 07:16:22.522666 I | embed: peerTLS: cert = /etc/kubernetes/pki/etcd/peer.crt, key = /etc/kubernetes/pki/etcd/peer.key, trusted-ca = /etc/kubernetes/pki/etcd/ca.crt, client-cert-auth = true, crl-file =
2022-08-20 07:16:22.523285 I | embed: name = k8s-node1
2022-08-20 07:16:22.523293 I | embed: data dir = /var/lib/etcd
2022-08-20 07:16:22.523296 I | embed: member dir = /var/lib/etcd/member
2022-08-20 07:16:22.523297 I | embed: heartbeat = 100ms
2022-08-20 07:16:22.523299 I | embed: election = 1000ms
2022-08-20 07:16:22.523301 I | embed: snapshot count = 10000
2022-08-20 07:16:22.523326 I | embed: advertise client URLs = https://10.0.2.13:2379
2022-08-20 07:16:22.523328 I | embed: initial advertise peer URLs = https://10.0.2.13:2380
2022-08-20 07:16:22.523331 I | embed: initial cluster =
2022-08-20 07:16:22.526589 I | etcdserver: recovered store from snapshot at index 4300435
2022-08-20 07:16:22.528482 C | etcdserver: recovering backend from snapshot error: failed to find database snapshot file (snap: snapshot file doesn't exist)
panic: recovering backend from snapshot error: failed to find database snapshot file (snap: snapshot file doesn't exist)
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0xc2cc4e]
goroutine 1 [running]:
go.etcd.io/etcd/etcdserver.NewServer.func1(0xc0002b8f50, 0xc0002b6f48)
/tmp/etcd-release-3.4.3/etcd/release/etcd/etcdserver/server.go:335 +0x3e
panic(0xed6960, 0xc000118070)
/usr/local/go/src/runtime/panic.go:522 +0x1b5
github.com/coreos/pkg/capnslog.(*PackageLogger).Panicf(0xc0001c3180, 0x10aeaf5, 0x2a, 0xc0002b7018, 0x1, 0x1)
/home/ec2-user/go/pkg/mod/github.com/coreos/pkg@v0.0.0-20160727233714-3ac0863d7acf/capnslog/pkg_logger.go:75 +0x135
go.etcd.io/etcd/etcdserver.NewServer(0x7fffbf439e78, 0x9, 0x0, 0x0, 0x0, 0x0, 0xc000200c00, 0x1, 0x1, 0xc000200d80, ...)
/tmp/etcd-release-3.4.3/etcd/release/etcd/etcdserver/server.go:456 +0x42f7
go.etcd.io/etcd/embed.StartEtcd(0xc00026c000, 0xc00026c580, 0x0, 0x0)
/tmp/etcd-release-3.4.3/etcd/release/etcd/embed/etcd.go:211 +0x9d0
go.etcd.io/etcd/etcdmain.startEtcd(0xc00026c000, 0x108423e, 0x6, 0x1, 0xc0001df1d0)
/tmp/etcd-release-3.4.3/etcd/release/etcd/etcdmain/etcd.go:302 +0x40
go.etcd.io/etcd/etcdmain.startEtcdOrProxyV2()
/tmp/etcd-release-3.4.3/etcd/release/etcd/etcdmain/etcd.go:144 +0x2f71
go.etcd.io/etcd/etcdmain.Main()
/tmp/etcd-release-3.4.3/etcd/release/etcd/etcdmain/main.go:46 +0x38
main.main()
/tmp/etcd-release-3.4.3/etcd/release/etcd/main.go:28 +0x20
解决方法:kubeadm reset:重置集群
步骤:
1、重置集群:
kubeadm reset
2、主节点执行
systemctl daemon-reload
systemctl restart kubelet
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
rm -rf $HOME/.kube
删除旧文件
rm -rf /var/lib/etcd
rm -rf /etc/cni/net.d
rm -rf /var/lib/kubelet
rm -rf /etc/kubernetes
3、初始化
sudo kubeadm init \
--apiserver-advertise-address=10.0.2.13 \
--image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
--kubernetes-version v1.17.3 \
--service-cidr=10.96.0.0/16 \
--pod-network-cidr=10.244.0.0/16
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
4、安装Pod网络插件(CNI)
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
使用本地 文件 kube-flannel.yml
链接:https://pan.baidu.com/s/1d-edsF0siAl6KbWuTI35Lw 提取码:u9k7
5、等待主节点 准备好后加入从节点
[root@k8s-node1 k8s]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready master 128m v1.17.3
6、其他节点操作
swapoff -a
kubeadm reset
systemctl daemon-reload
systemctl restart kubelet
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
kubeadm join 10.0.2.13:6443 --token 1qhy32.5hhngdghnq019ovy --discovery-token-ca-cert-hash sha256:ef6305ace5b0a169149a767be4298bc5a9a4b0f71d0750f43d43e97042da9953
如果token过期,重新获取一个没有过期时间的token【默认是2h过期】:
kubeadm token create --print-join-command --ttl=0
参考链接:
更多推荐
已为社区贡献3条内容
所有评论(0)