一、写在前面

部署k8s时,主节点部署成功了,从节点1执行kubeadm join也成功了,从节点2执行kubeadm join一直卡在[preflight] Running pre-flight checks不动!

二、问题排查

网上查的资料,不管是时间同步,还是重新生成token都尝试了,结果还是不行。
kubeadm token list 查看token也并没有 过期。
kubeadm token create --ttl 0 --print-join-command 重新生成token也不行。
而且重新部署之后,还是不行。

# 0、删除node
kubectl get nodes
kubectl cordon w1 # 不可调度
kubectl drain w1 --ignore-daemonsets
kubectl delete node w1
# 1、重置 从节点都要做
kubeadm reset
rm -rf /etc/kubernetes/*
rm -rf ~/.kube
# 2、重新init
# 3、重新执行init后的日志
# 4、重新部署calico网络插件
# 5、从节点重新加入

关键问题是,从节点1正常能加入集群,为什么从节点2无法加入集群???

1、执行join时加上-v=2参数查看日志

[root@w1 ~]# kubeadm join -v=2 192.168.56.100:6443 --token wvsok4.5kjxe1ts8kidll1b --discovery-token-ca-cert-hash sha256:e94113cc2b2fb1b9994c7e419c5f3b776493c7151377812672fe55163b3f97a5
I0703 09:09:38.169190    2029 join.go:367] [preflight] found NodeName empty; using OS hostname as NodeName
I0703 09:09:38.169794    2029 initconfiguration.go:105] detected and using CRI socket: /var/run/dockershim.sock
[preflight] Running pre-flight checks
I0703 09:09:38.169865    2029 preflight.go:90] [preflight] Running general checks
I0703 09:09:38.170055    2029 checks.go:254] validating the existence and emptiness of directory /etc/kubernetes/manifests
I0703 09:09:38.170069    2029 checks.go:292] validating the existence of file /etc/kubernetes/kubelet.conf
I0703 09:09:38.170078    2029 checks.go:292] validating the existence of file /etc/kubernetes/bootstrap-kubelet.conf
I0703 09:09:38.170085    2029 checks.go:105] validating the container runtime
I0703 09:09:38.221649    2029 checks.go:131] validating if the service is enabled and active
I0703 09:09:38.262731    2029 checks.go:341] validating the contents of file /proc/sys/net/bridge/bridge-nf-call-iptables
I0703 09:09:38.262898    2029 checks.go:341] validating the contents of file /proc/sys/net/ipv4/ip_forward
I0703 09:09:38.262920    2029 checks.go:653] validating whether swap is enabled or not
I0703 09:09:38.262941    2029 checks.go:382] validating the presence of executable ip
I0703 09:09:38.263176    2029 checks.go:382] validating the presence of executable iptables
I0703 09:09:38.263554    2029 checks.go:382] validating the presence of executable mount
I0703 09:09:38.263659    2029 checks.go:382] validating the presence of executable nsenter
I0703 09:09:38.263669    2029 checks.go:382] validating the presence of executable ebtables
I0703 09:09:38.263680    2029 checks.go:382] validating the presence of executable ethtool
I0703 09:09:38.263688    2029 checks.go:382] validating the presence of executable socat
I0703 09:09:38.263696    2029 checks.go:382] validating the presence of executable tc
I0703 09:09:38.263703    2029 checks.go:382] validating the presence of executable touch
I0703 09:09:38.263718    2029 checks.go:524] running all checks
I0703 09:09:38.275230    2029 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I0703 09:09:38.275514    2029 checks.go:622] validating kubelet version
I0703 09:09:38.311281    2029 checks.go:131] validating if the service is enabled and active
I0703 09:09:38.316858    2029 checks.go:209] validating availability of port 10250
I0703 09:09:38.317624    2029 checks.go:292] validating the existence of file /etc/kubernetes/pki/ca.crt
I0703 09:09:38.317634    2029 checks.go:439] validating if the connectivity type is via proxy or direct
I0703 09:09:38.317653    2029 join.go:427] [preflight] Discovering cluster-info
I0703 09:09:38.317704    2029 token.go:200] [discovery] Trying to connect to API Server "192.168.56.100:6443"
I0703 09:09:38.318179    2029 token.go:75] [discovery] Created cluster-info discovery client, requesting info from "https://192.168.56.100:6443"
I0703 09:09:38.319099    2029 token.go:83] [discovery] Failed to request cluster info, will try again: [Get https://192.168.56.100:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 192.168.56.100:6443: connect: protocol not available]

发现提示protocol not available,然后我们使用curl https://192.168.56.100:6443/api/v1/namespaces/kube-public/configmaps/cluster-info,发现也确实是提示protocol not available。

从主节点curl,发现有以下提示:

curl: (60) Peer's Certificate issuer is not recognized.
More details here: http://curl.haxx.se/docs/sslcerts.html

curl performs SSL certificate verification by default, using a "bundle"
 of Certificate Authority (CA) public keys (CA certs). If the default
 bundle file isn't adequate, you can specify an alternate file
 using the --cacert option.
If this HTTPS server uses a certificate signed by a CA represented in
 the bundle, the certificate verification probably failed due to a
 problem with the certificate (it might be expired, or the name might
 not match the domain name in the URL).
If you'd like to turn off curl's verification of the certificate, use
 the -k (or --insecure) option.

但是!我们使用浏览器访问这个网址,发现是https链接的证书出现了问题!

2、处理证书问题

搜了一大堆资料,大致参考了以下:
https://www.cnblogs.com/hkgov/p/14959992.html
https://blog.csdn.net/u012375924/article/details/108832392

(1)处理方式1
将这一串追加到/etc/pki/tls/certs/ca-bundle.crt文件
在这里插入图片描述
(2)处理方式2
随便下载一个有效的证书,将该文件上传到目录/etc/pki/ca-trust/source/anchors/下,将文件的后缀名改为.crt, 然后执行命令update-ca-trust extract

再次使用join命令,发现还是不行。

3、重启

再次将主节点重启之后,发现,join命令竟然可以使用了。。。。

# 查看日志
journalctl -u kubelet -f
# 重启k8(如果一直没ready的话)
systemctl restart kubelet && systemctl enable kubelet

4、其他方法1

https://blog.csdn.net/axin_123456/article/details/128961219

可能的原因: 之前错误操作,

systemctl stop NetworkManager--临时关闭

systemctl disable NetworkManager --永久关闭网络管理命令

又重新做了如下操作:

systemctl start NetworkManager

systemctl start network.service --开启网络服务

5、其他方法2

#安装utpdate工具
yum -y install utp ntpdate

timedatectl set-timezone Asia/Shanghai # 设置系统时区为上海

#设置系统时间与网络时间同步
ntpdate cn.pool.ntp.org

#将系统时间写入硬件时间
hwclock --systohc

三、总结

就上面那几种方式……不知道哪一个生效了,最后都是重启主节点之后就好了。注意!只重启主节点即可,重启主节点+从节点仍然不好用。

不知道什么毛病……

要不是为了学习,才不会自己手贱装这玩意。。。

参考资料

https://www.cnblogs.com/hkgov/p/14959992.html
https://blog.csdn.net/u012375924/article/details/108832392

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐