【K8S】v1.26集群搭建 - 完整篇
【k8s】Kubernetes 1.26 集群搭建 - 完整篇,各种异常问题解决方案
·
K8S v1.26集群环境安装 - 完整篇
Kubernetes 1.26 安装
一、集群简介
使用 vagrant 快速搭建一主两从三台虚拟机集群。
虚拟机名 | IP | 用户 | 备注 | 配置 |
---|---|---|---|---|
k8s-node1 | 192.168.56.101 | root | master | 2核2G |
k8s-node2 | 192.168.56.101 | root | 从节点 | 2核2G |
k8s-node3 | 192.168.56.101 | root | 从节点 | 2核2G |
资源清单
- VirtualBox 安装包
- Vagrant 安装包
- CentOS box文件
Vagrantfile
创建虚拟机文件centos-init.sh
虚拟机环境初始化脚本docker-init.sh
docker初始化脚本k8s-init.sh
k8s初始化脚本master-images.sh
master服务器docker镜像拉取脚本kube-flannel.yml
网络组件kubeadm-config.yml
k8s初始化配置文件(参考)
二、Vagrant 创建三台CentOS7服务器
请先下载
centos7.0-x86_64.box
虚拟机box文件
- 初始化 centos7
# 添加本地 box(注意路径不能有空格) vagrant box add centos7 D:/MallProject/app/VirtualBox-VMs/centos7.0-x86_64.box # 查看本地 box vagrant box list # 先进入指定目录,打开命令窗口 cd D:\WorkFiles\k8s\vagrant # 初始化本地 centos7 文件(在当前目录下会生成 Vagrantfile 文件) vagrant init centos7
- 修改生成的 Vagrantfile 文件(覆盖替换,关注ip)
# -*- mode: ruby -*- # vi: set ft=ruby : Vagrant.configure("2") do |config| config.vm.box_check_update = false config.vm.provider 'virtualbox' do |vb| vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-threshold", 1000 ] end config.vm.synced_folder ".", "/vagrant", type: "rsync" $num_instances = 3 # $etcd_cluster = "k8s-node1=http://192.168.56.101:2380" (1..$num_instances).each do |i| config.vm.define "k8s-node#{i}" do |node| node.vm.box = "centos7" node.vm.hostname = "k8s-node#{i}" ip = "192.168.56.#{i+100}" node.vm.network "private_network", ip: ip node.vm.provider "virtualbox" do |vb| vb.memory = "2048" vb.cpus = 2 vb.name = "k8s-node#{i}" end # node.vm.provision "shell", path: "install.sh", args: [i, ip, $etcd_cluster] end end end
- 启动镜像(等待完成后,可在 VirtualBox 上看到已创建好三台虚拟机 )
# 注意在Vagrantfile所在目录下执行 vagrant up
- 进入三个虚拟机,开启ssh权限,以便后续通过xshell登录操作
# cmd窗口登录 vagrant ssh k8s-node1 # 修改 root 密码 - 根据提示输入两次新密码 sudo passwd root # 切换到 root 若不修改,默认为 vagrant su root # 查看当前用户 whoami vi /etc/ssh/sshd_config #修改 PasswordAuthentication no 为 yes PasswordAuthentication yes # 重启服务 service sshd restart
三、虚拟机环境配置
注:若执行失败,可参考脚本手动执行
- 下载并上传
centos-init.sh
到三台服务器 - 执行脚本进行初始化环境
# 设置sh脚本格式 vi /opt/centos-init.sh :set fileformat=unix :wq # 赋予可执行权限 chmod +x /opt/centos-init.sh # 执行脚本 sh centos-init.sh
四、docker安装
可参考脚本手动执行,避免网络中断导致的脚本执行错误
- 下载并上传
docker-init.sh
到三台服务器 - 执行脚本进行初始化环境
# 设置sh脚本格式 vi /opt/docker-init.sh :set fileformat=unix :wq # 赋予可执行权限 chmod +x /opt/docker-init.sh # 执行脚本 sh docker-init.sh
五、k8s安装
可参考脚本手动执行,避免网络中断导致的脚本执行错误
注:k8s-v1.24以上版本,默认弃用docker
- 下载并上传
k8s-init.sh
到三台服务器 - 执行脚本进行初始化环境
# 设置sh脚本格式 vi /opt/k8s-init.sh :set fileformat=unix :wq # 赋予可执行权限 chmod +x /opt/k8s-init.sh # 执行脚本 sh k8s-init.sh
六、k8s master节点初始化
操作步骤
-
下载并上传
master-images.sh
到master
服务器,即k8s-node1
-
执行脚本拉取依赖的镜像
# 设置sh脚本格式 vi /opt/master-images.sh :set fileformat=unix :wq # 赋予可执行权限 chmod +x /opt/master-images.sh # 执行前可以先查看所需组件的版本 kubeadm config images list --kubernetes-version v1.26.0 # 执行脚本 sh master-images.sh
-
初始化k8s
# 一、指令式初始化 # 注意,apiserver-advertise-address地址,通过 ip addr 命令,查看 eth0 获得 # 2.6版本后,image-repository参数不再传递给cri运行时去下载pause镜像 # 需要手动修改 /etc/containerd/config.toml 文件,见 1.6.1 问题 1 kubeadm init \ --apiserver-advertise-address=10.0.2.15 \ --control-plane-endpoint=k8s-node1 \ --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \ --kubernetes-version v1.26.0 \ --service-cidr=10.96.0.0/16 \ --pod-network-cidr=10.244.0.0/16 # 二、通过配置文件初始化集群,两者选其一(注:需修改配置文件) kubeadm config print init-defaults > kubeadm-config.yml # 初始化集群 kubeadm init --config kubeadm-config.yml # 若初始化成功,按照提示执行 mkdir -p $HOME/.kube sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config sudo chown $(id -u):$(id -g) $HOME/.kube/config # 记得复制保存打印的命令: # kubeadm join 10.0.2.15:6443 --token np71a6.197zbcz5l6yur1ji --discovery-token-ca-cert-hash sha256:82e0eeadf242ce9fbcb0d3681116d3f845bf8a77deeb0a02e38e57c17112cb35
注:若安装高版本大于等于
1.26
,建议先直接执行以下问题的解决操作,再初始化# 重置设置 - 若初始化失败,再次初始化之前,先执行此命令重置 kubeadm reset # 查看状态 systemctl status kubelet # 查看错误日志 journalctl -xeu kubelet
-
安装flannel网络组件
下载并上传kube-flannel.yml
到master服务器kubectl apply -f kube-flannel.yml # 卸载(失败重装时可先卸载) kubectl delete -f kube-flannel.yml
注:若安装失败,根据报错信息,将
PodSecurityPolicy
组件注释再安装
初始化失败问题集
6.1 问题: containerd
问题
[init] Using Kubernetes version: v1.26.0
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-03-13T16:54:24+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
解决方式
# 检查配置并备份
mv /etc/containerd/config.toml /etc/containerd/config.toml.back
# 重新生成配置
containerd config default > /etc/containerd/config.toml
vi /etc/containerd/config.toml
# 1.找到 [plugins."io.containerd.grpc.v1.cri".registry.mirrors] 并追加4行(不确定此配置是否必要)
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
endpoint = ["https://dockerhub.mirrors.nwafu.edu.cn"]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]
endpoint = ["https://registry.aliyuncs.com/k8sxio"]
# 2.找到 [plugins."io.containerd.grpc.v1.cri"] 设置pause镜像地址(必要)
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.aliyuncs.com/k8sxio/pause:3.6"
# 重启服务
systemctl daemon-reload
systemctl enable containerd
systemctl restart containerd
systemctl status containerd
6.2 问题: cgroup
问题
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
原因:kebernetes默认设置
cgroup
驱动为systemd
,而docker服务的cgroup
驱动为cgroupfs
,所以,修改其中一个保持一致即可
解决方式一(修改k8s):
# 修改k8s控制组为 cgroupfs
sudo mkdir -p /etc/systemd/system/kubelet.service.d
sudo tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf <<EOF
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cgroup-driver=cgroupfs"
EOF
systemctl daemon-reload
systemctl restart kubelet
解决方式二(修改docker)博主使用的此方法:
# 查看docker状态
docker info | grep 'Cgroup'
# 设置docker控制组为 systemd
vi /etc/docker/daemon.json
"exec-opts": ["native.cgroupdriver=systemd"],
systemctl daemon-reload
systemctl restart docker
七、 k8s 从节点加入master节点
操作步骤
- 在每个从节点服务器执行
# 此命令在 master 服务器执行 kubeadm init 初始化成功后会打印 kubeadm join 10.0.2.15:6443 --token np71a6.197zbcz5l6yur1ji \ --discovery-token-ca-cert-hash sha256:82e0eeadf242ce9fbcb0d3681116d3f845bf8a77deeb0a02e38e57c17112cb35
- 等待几分钟后,在master查看
kubectl get nodes
加入失败问题集
7.1 token过期
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.2. Latest validated version: 18.09
error execution phase preflight: couldn't validate the identity of the API Server: abort connecting to API servers after timeout of 5m0s
解决办法
# 在master服务器执行,生成新的token
kubeadm token create --ttl 24h --print-join-command
# 查看生成的 token
kubeadm token list
7.2. service containerd 异常(注:前面已执行过 6.1 解决方法,便不会出现此问题)
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-03-15T16:04:50+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
解决办法
# 检查配置并备份
mv /etc/containerd/config.toml /etc/containerd/config.toml.back
# 重新生成配置
containerd config default > /etc/containerd/config.toml
# 重启服务
systemctl daemon-reload
systemctl enable containerd
systemctl restart containerd
systemctl status containerd
7.3. connection refused 连接异常
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: Get "https://10.0.2.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s": dial tcp 10.0.2.15:6443: connect: connection refused
解决办法
# 1. 重置 kubeadm reset
# 2. 修改 k8s 初始化时指定的IP(eth0=10.0.2.15 改为 eht1=192.168.56.101)
# 若使用配置文件初始化的,要修改 kubeadm-config.yml中 IP
kubeadm init --apiserver-advertise-address=192.168.56.101 ...
# 3. 修改 kube-flannel.yml 指定 --iface=eth1
7.4. 证书过期,博主未遇到,可参考
7.5. 加入成功,但工作节点命令执行失败
[root@k8s-node2 ~]# kubectl get nodes
E0317 17:10:08.837852 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.838065 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.839914 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.842028 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.843524 8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
解决办法
# 1. 复制 master 节点文件到工作节点 /etc/kubernetes/admin.conf
# 2. 设置环境变量
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> /etc/profile
# 3. 刷新环境
/etc/kubernetes/admin.conf
7.6. 工作节点 NotReady
[root@k8s-node1 opt]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready control-plane 79m v1.26.0
k8s-node2 NotReady <none> 63m v1.26.0
k8s-node3 NotReady <none> 62m v1.26.0
[root@k8s-node2 opt]# journalctl -f -u kubelet.service
rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.6\": failed to pull image \"registry.k8s.io/pause:3.6\
Mar 17 17:44:17 k8s-node2 kubelet[8546]: E0317 17:44:17.090922 8546 kubelet.go:2475] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"
解决办法
# 1. 见 6.1 问题 1,修改工作节点 /etc/containerd/config.toml,指定 pause 拉取地址
# 2. 重启 containerd
# 3. 重启 kubelet
systemctl restart kubelet
[root@k8s-node3 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-node1 Ready control-plane 92m v1.26.0
k8s-node2 Ready <none> 75m v1.26.0
k8s-node3 Ready <none> 75m v1.26.0
自此,k8s集群搭建完成,开始你们的表演吧~
最后,别忘了点赞
、关注
、收藏
~
更多推荐
已为社区贡献1条内容
所有评论(0)