在运行一段时间的集群中加入新的节点(k8s-node)
前言:新部署的 k8s 集群添加 node 节点,只需要 kubeadm join 即可,如果一个集群运行一段时间后,再需要添加 node ,由于 token 和 sha256 编码没有记录,需要重新查看
·
前言:新部署的 k8s 集群添加 node 节点,只需要 kubeadm join 即可,如果一个集群运行一段时间后,再需要添加 node ,由于 token 和 sha256 编码没有记录,需要重新查看
1 查看现有集群 node 信息
root@gz-gpu101:~# kubectl get node
NAME STATUS ROLES AGE VERSION
gz-cpu031 Ready node 24d v1.14.1-2
gz-cpu032 Ready node 24d v1.14.1-2
gz-cpu033 Ready node 24d v1.14.1-2
gz-gpu101 Ready master 24d v1.14.1-2
root@gz-gpu101:~#
2 查看 token
默认 token 的有效期为 24 小时,当过期之后,该 token 就不可用了,在 master 节点上执行 kubeadm token create 重新创建 token 即可
而,我们这个集群居然是永久有效,那就省略 create 的步骤了
root@gz-gpu101:~# kubeadm token list
TOKEN TTL EXPIRES USAGES DESCRIPTION EXTRA GROUPS
hwfep2.iw7q7ltqdwxbati5 <forever> <never> authentication,signing <none> system:bootstrappers:kubeadm:default-node-token
root@gz-gpu101:~#
3 获取ca证书sha256编码hash值
root@gz-gpu101:~# openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | awk '{print $2}'
72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
root@gz-gpu101:~#
上面的步骤略显麻烦,可以在 k8s-master 上直接使用命令生成新的令牌
# 命令:kubeadm token create --print-join-command
root@gz-gpu101:~# kubeadm token create --print-join-command
kubeadm join 172.18.12.23:6443 --token pc9edn.11qlnpdpljlcs900 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
root@gz-gpu101:~#
4 新节点加入集群
各需要加入到 k8s master 集群中的 node 节点都要初始化(关闭防火墙、swap等),安装 docker、kubeadm、kubelet,启动 kubelet 服务
具体请查看,kubeadm 安装部署 k8s
在新的 node(gz-gpu082) 上执行 kubeadm init 添加命令
kubeadm join 172.18.12.23:6443 --token hwfep2.iw7q7ltqdwxbati5 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
5 再次查看
root@gz-gpu101:~# kubectl get node
NAME STATUS ROLES AGE VERSION
gz-cpu031 Ready node 24d v1.14.1-2
gz-cpu032 Ready node 24d v1.14.1-2
gz-cpu033 Ready node 24d v1.14.1-2
gz-gpu101 Ready master 24d v1.14.1-2
qa-gpu082 Ready <none> 2m18s v1.14.1
root@gz-gpu101:~#
6 其他
6.1 修改 ROLES
新加入的 node ROLES 显示 ,观感不佳,修改一下,以下操作都是在 k8s-master 上执行
# 查看标签
root@gz-gpu101:~# kubectl get node --show-labels|grep role
# 给新 node 添加标签(此操作需要根据上个命令的结果做参考设置)
root@gz-gpu101:~# kubectl label node qa-gpu082 kubernetes.io/role=node
node/qa-gpu082 labeled
# 查看 node 信息
root@gz-gpu101:~# kubectl get node
NAME STATUS ROLES AGE VERSION
gz-cpu031 Ready node 24d v1.14.1-2
gz-cpu032 Ready node 24d v1.14.1-2
gz-cpu033 Ready node 24d v1.14.1-2
gz-gpu101 Ready master 24d v1.14.1-2
qa-gpu082 Ready node 14m v1.14.1
root@gz-gpu101:~#
6.2 错误处理
错误信息
root@qa-gpu082:~# kubeadm join 172.18.12.23:6443 --token hwfep2.iw7q7ltqdwxbati5 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 18.09
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR FileAvailable--etc-kubernetes-bootstrap-kubelet.conf]: /etc/kubernetes/bootstrap-kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
root@qa-gpu082:~#
这台机器以前可能被用过,所以需要 reset 一下
解决办法
root@qa-gpu082:~# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0120 10:00:56.461284 684201 reset.go:234] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /etc/cni/net.d /var/lib/dockershim /var/run/kubernetes]
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually.
For example:
iptables -F && iptables -t nat -F && iptables -t mangle -F && iptables -X
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
root@qa-gpu082:~#
# 再次执行 join
root@qa-gpu082:~# kubeadm join 172.18.12.23:6443 --token hwfep2.iw7q7ltqdwxbati5 --discovery-token-ca-cert-hash sha256:72acd5b1545b4488ab6385bc9511854557d3460cc013b6595b1e307d8c88f060
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.7. Latest validated version: 18.09
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[WARNING RequiredIPVSKernelModulesAvailable]:
The IPVS proxier may not be used because the following required kernel modules are not loaded: [ip_vs_rr ip_vs_wrr ip_vs_sh ip_vs]
or no builtin kernel IPVS support was found: map[ip_vs:{} ip_vs_rr:{} ip_vs_sh:{} ip_vs_wrr:{} nf_conntrack:{}].
However, these modules may be loaded automatically by kube-proxy if they are available on your system.
To verify IPVS support:
Run "lsmod | grep 'ip_vs|nf_conntrack'" and verify each of the above modules are listed.
If they are not listed, you can use the following methods to load them:
1. For each missing module run 'modprobe $modulename' (e.g., 'modprobe ip_vs', 'modprobe ip_vs_rr', ...)
2. If 'modprobe $modulename' returns an error, you will need to install the missing module support for your kernel.
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.14" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
root@qa-gpu082:~#
更多推荐
已为社区贡献5条内容
所有评论(0)