实战：kubeadm方式搭建k8s集群(k8s-v1.22.2,containerd-v1.5.5)-2023.2.22(测试成功)

k8s

一念一生～one

1464人浏览 · 2023-02-22 07:52:54

一念一生～one · 2023-02-22 07:52:54 发布

实验环境

1、硬件环境

3台虚机 2c2g,20g。(nat模式，可访问外网)

角色	主机名	ip
master节点	master1	172.29.9.51
node节点	node1	172.29.9.52
node节点	node2	172.29.9.53

2、软件环境

软件	版本
操作系统	centos7.6_x64 1810 mini(其他centos7.x版本也行)
containerd	v1.5.5
kubernetes	v1.22.2

实验软件

链接：https://pan.baidu.com/s/124kie2_6u3JMN6yolZlZZw
提取码：le5w

2021.11.01-实验软件-k8s集群搭建containerd

1、集群环境配置

⚠️ all节点均要配置。

使用 containerd 作为容器运行时搭建 Kubernetes 集群。

现在我们使用 kubeadm 从头搭建一个使用 containerd 作为容器运行时的 Kubernetes 集群，这里我们安装最新的 v1.22.2 版本。

1.1配置主机名

hostnamectl --static set-hostname master1
bash

hostnamectl --static set-hostname node1
bash

hostnamectl --static set-hostname node2
bash

注意：
节点的 hostname 必须使用标准的 DNS 命名，另外千万不用什么默认的 localhost 的 hostname，会导致各种错误出现的。在 Kubernetes 项目里，机器的名字以及一切存储在 Etcd 中的 API 对象，都必须使用标准的 DNS 命名（RFC 1123）。可以使用命令 hostnamectl set-hostname node1 来修改 hostname。

1.2关闭防火墙，selinux

systemctl stop firewalld && systemctl disable  firewalld
systemctl stop NetworkManager && systemctl disable  NetworkManager

setenforce 0
sed -i s/SELINUX=enforcing/SELINUX=disabled/ /etc/selinux/config

1.3关闭swap分区

swapoff -a
sed -ri 's/.*swap.*/#&/' /etc/fstab

问题：k8s集群安装为什么需要关闭swap分区？
swap必须关，否则kubelet起不来,进而导致k8s集群起不来；
可能kublet考虑到用swap做数据交换的话，对性能影响比较大；

1.4配置dns解析

cat >> /etc/hosts << EOF
172.29.9.51 master1
172.29.9.52 node1
172.29.9.53 node2
EOF

问题：k8s集群安装时节点是否需要配置dns解析？
就是后面的kubectl如果需要连接运行在node上面的容器的话，它是通过kubectl get node出来的名称去连接的，所以那个的话，我们需要在宿主机上能够解析到它。如果它解析不到的话，那么他就可能连不上；

1.5将桥接的IPv4流量传递到iptables的链

modprobe br_netfilter

cat > /etc/sysctl.d/k8s.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

sysctl --system

注意：将桥接的IPv4流量传递到iptables的链
由于开启内核 ipv4 转发需要加载 br_netfilter 模块，所以加载下该模块：
modprobe br_netfilter
bridge-nf说明：

bridge-nf 使得 netfilter 可以对 Linux 网桥上的 IPv4/ARP/IPv6 包过滤。比如，设置net.bridge.bridge-nf-call-iptables＝1后，二层的网桥在转发包时也会被 iptables的 FORWARD 规则所过滤。常用的选项包括：

net.bridge.bridge-nf-call-arptables：是否在 arptables 的 FORWARD 中过滤网桥的 ARP 包

net.bridge.bridge-nf-call-ip6tables：是否在 ip6tables 链中过滤 IPv6 包

net.bridge.bridge-nf-call-iptables：是否在 iptables 链中过滤 IPv4 包

net.bridge.bridge-nf-filter-vlan-tagged：是否在 iptables/arptables 中过滤打了 vlan 标签的包。

1.6安装 ipvs

cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF

chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

yum install ipset -y
yum install ipvsadm -y

说明：
01、上面脚本创建了的/etc/sysconfig/modules/ipvs.modules文件，保证在节点重启后能自动加载所需模块。使用lsmod | grep -e ip_vs -e nf_conntrack_ipv4命令查看是否已经正确加载所需的内核模块;
02、要确保各个节点上已经安装了 ipset 软件包，因此需要：yum install ipset -y;
03、为了便于查看 ipvs 的代理规则，最好安装一下管理工具 ipvsadm：yum install ipvsadm -y

1.7同步服务器时间

yum install chrony -y
systemctl enable chronyd --now
chronyc sources

1.8配置免密(方便后期从master节点传文件到node节点)

#在master1节点执行如下命令，按2次回车
ssh-keygen

#在master1节点执行
#以下2条命令注意，需要多次交互
ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.29.9.52
ssh-copy-id -i ~/.ssh/id_rsa.pub root@172.29.9.53

2、安装 Containerd

⚠️ all节点均要配置。

2.1安装containerd

cd /root/
yum install libseccomp -y

#测试安装时，直接使用我提供的安装包就好，当然你也可以下载这个安装包；
#wget https://download.fastgit.org/containerd/containerd/releases/download/v1.5.5/cri-containerd-cni-1.5.5-linux-amd64.tar.gz
tar -C / -xzf cri-containerd-cni-1.5.5-linux-amd64.tar.gz

echo "export PATH=$PATH:/usr/local/bin:/usr/local/sbin" >> ~/.bashrc
source ~/.bashrc

mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml

systemctl enable containerd --now
ctr version

说明：centos7上具体如何安装containerd，请看文章:实战：centos7上containerd的安装-20211023,本次只提供具体shell命令。

2.2将 containerd 的 cgroup driver 配置为 systemd

对于使用 systemd 作为 init system 的 Linux 的发行版，使用 systemd 作为容器的 cgroup driver 可以确保节点在资源紧张的情况更加稳定，所以推荐将 containerd 的 cgroup driver 配置为 systemd。

修改前面生成的配置文件 /etc/containerd/config.toml，在 plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options 配置块下面将 SystemdCgroup 设置为 true：

#通过搜索SystemdCgroup进行定位
#vim /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc]
  ...
  [plugins."io.containerd.grpc.v1.cri".containerd.runtimes.runc.options]
    SystemdCgroup = true
    ....

#注意：最终输出shell命令：
sed -i "s/SystemdCgroup = false/SystemdCgroup = true/g" /etc/containerd/config.toml

2.3配置镜像加速器地址

然后再为镜像仓库配置一个加速器，需要在 cri 配置块下面的 registry 配置块下面进行配置 registry.mirrors：（注意缩进）

[root@master1 ~]#vim /etc/containerd/config.toml
[plugins."io.containerd.grpc.v1.cri".registry]
  [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."docker.io"]
      endpoint = ["https://kvuwuws2.mirror.aliyuncs.com"]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors."k8s.gcr.io"]
      endpoint = ["https://registry.aliyuncs.com/k8sxio"]

2.4配置pause镜像地址

配置方法：

[root@master1 ~]#vim /etc/containerd/config.toml
……
sandbox_image = "registry.aliyuncs.com/k8sxio/pause:3.5"
……

2.5启动containerd服务

由于上面我们下载的 containerd 压缩包中包含一个 etc/systemd/system/containerd.service 的文件，这样我们就可以通过 systemd 来配置 containerd 作为守护进程运行了，现在我们就可以启动 containerd 了，直接执行下面的命令即可：

systemctl daemon-reload
systemctl enable containerd --now

2.6验证

启动完成后就可以使用 containerd 的本地 CLI 工具 ctr 和 crictl 了，比如查看版本：

ctr version
crictl version

至此，containerd安装完成。

3、使用 kubeadm 部署 Kubernetes

3.1添加阿里云YUM软件源

⚠️ all节点均要配置。

我们使用阿里云的源进行安装：

cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

3.2安装 kubeadm、kubelet、kubectl

⚠️ all节点均要配置。

yum makecache fast
yum install -y kubelet-1.22.2 kubeadm-1.22.2 kubectl-1.22.2 --disableexcludes=kubernetes
kubeadm version

systemctl enable --now kubelet

说明：–disableexcludes 禁掉除了kubernetes之外的别的仓库

符合预期：

3.3初始化集群

⚠️ master1节点操作

当我们执行 kubelet --help 命令的时候可以看到原来大部分命令行参数都被 DEPRECATED了，这是因为官方推荐我们使用 --config 来指定配置文件，在配置文件中指定原来这些参数的配置，可以通过官方文档 Set Kubelet parameters via a config file 了解更多相关信息，这样 Kubernetes 就可以支持**动态 Kubelet 配置（Dynamic Kubelet Configuration）**了，参考 Reconfigure a Node’s Kubelet in a Live Cluster。

然后我们可以通过下面的命令在 master 节点上输出集群初始化默认使用的配置：

⚠️ 以下配置过程只是让我们了解下细节部分，安装时直接用我软件包里提供的kubeadm.yaml就好，这里都是改好的。

[root@master1 ~]#kubeadm config print init-defaults --component-configs KubeletConfiguration > kubeadmyaml

然后根据我们自己的需求修改配置，比如修改 imageRepository 指定集群初始化时拉取 Kubernetes 所需镜像的地址，kube-proxy 的模式为 ipvs，另外需要注意的是我们这里是准备安装 flannel 网络插件的，需要将 networking.podSubnet 设置为10.244.0.0/16：

[root@master1 ~]#vim kubeadmyaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 172.29.9.51  # 修改1：指定master节点内网IP
  bindPort: 6443
nodeRegistration:
  criSocket: /run/containerd/containerd.sock  # 修改2：使用 containerd的Unix socket 地址
  imagePullPolicy: IfNotPresent
  name: master1 #name #修改3：修改master节点名称
  taints:  # 修改4：给master添加污点，master节点不能调度应用
  - effect: "NoSchedule"
    key: "node-role.kubernetes.io/master"

---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: ipvs  # 修改5：修改kube-proxy 模式为ipvs，默认为iptables

---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/k8sxio #修改6：image地址
kind: ClusterConfiguration
kubernetesVersion: 1.22.2 #修改7：指定k8s版本号，默认这里忽略了小版本号
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 10.244.0.0/16  # 修改8：指定 pod 子网
scheduler: {}

---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
cgroupDriver: systemd  # 修改9：配置 cgroup driver
logging: {}
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

配置提示

对于上面的资源清单的文档比较杂，要想完整了解上面的资源对象对应的属性，可以查看对应的 godoc 文档，地址: https://godoc.org/k8s.io/kubernetes/cmd/kubeadm/app/apis/kubeadm/v1beta3。

在开始初始化集群之前可以使用 kubeadm config images pull --config kubeadm.yaml 预先在各个服务器节点上拉取所k8s需要的容器镜像。

我们可以先list一下：

[root@master1 ~]#kubeadm config images list
[root@master1 ~]#kubeadm config images list --config kubeadm.yaml

#记得昨天测试都没报错啊，现在怎么报错了。。。可以忽略，不影响后续使用的。

配置文件准备好过后，可以使用如下命令先将相关镜像 pull 下面：

[root@master1 ~]#kubeadm config images pull --config kubeadm.yaml
W1031 06:59:20.922890   25580 strict.go:55] error unmarshaling configuration schema.GroupVersionKind{Group:"kubelet.config.k8s.io", Version:"v1beta1", Kind:"KubeletConfiguration"}: error converting YAML to JSON: yaml: unmarshal errors:
  line 27: key "cgroupDriver" already set in map
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-apiserver:v1.22.2
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.22.2
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-scheduler:v1.22.2
[config/images] Pulled registry.aliyuncs.com/k8sxio/kube-proxy:v1.22.2
[config/images] Pulled registry.aliyuncs.com/k8sxio/pause:3.5
[config/images] Pulled registry.aliyuncs.com/k8sxio/etcd:3.5.0-0
failed to pull image "registry.aliyuncs.com/k8sxio/coredns:v1.8.4": output: time="2021-10-31T07:04:50+08:00" level=fatal msg="pulling image: rpc error: code = NotFound desc = failed to pull and unpack image \"registry.aliyuncs.com/k8sxio/coredns:v1.8.4\": failed to resolve reference \"registry.aliyuncs.com/k8sxio/coredns:v1.8.4\": registry.aliyuncs.com/k8sxio/coredns:v1.8.4: not found"
, error: exit status 1
To see the stack trace of this error execute with --v=5 or higher
[root@master1 ~]#

上面在拉取 coredns 镜像的时候出错了，阿里云仓库里没有找到这个镜像，我们可以手动到官方仓库 pull 该镜像，然后重新 tag 下镜像地址即可：

#汇总命令如下：
ctr -n k8s.io i pull docker.io/coredns/coredns:1.8.4
ctr -n k8s.io i tag docker.io/coredns/coredns:1.8.4 registry.aliyuncs.com/k8sxio/coredns:v1.8.4

#测试过程如下：
[root@master1 ~]#ctr -n k8s.io i pull docker.io/coredns/coredns:1.8.4
docker.io/coredns/coredns:1.8.4:                                                  resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:6e5a02c21641597998b4be7cb5eb1e7b02c0d8d23cce4dd09f4682d463798890:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:10683d82b024a58cc248c468c2632f9d1b260500f7cd9bb8e73f751048d7d6d4: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:bc38a22c706b427217bcbd1a7ac7c8873e75efdd0e59d6b9f069b4b243db4b4b:    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:8d147537fb7d1ac8895da4d55a5e53621949981e2e6460976dae812f83d84a44:   done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:c6568d217a0023041ef9f729e8836b19f863bcdb612bb3a329ebc165539f5a80:    done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 15.9s                                                                    total:  12.1 M (780.6 KiB/s)                       
unpacking linux/amd64 sha256:6e5a02c21641597998b4be7cb5eb1e7b02c0d8d23cce4dd09f4682d463798890...
done: 684.151259ms

[root@master1 ~]#ctr -n k8s.io i ls -q
docker.io/coredns/coredns:1.8.4
registry.aliyuncs.com/k8sxio/etcd:3.5.0-0
registry.aliyuncs.com/k8sxio/etcd@sha256:9ce33ba33d8e738a5b85ed50b5080ac746deceed4a7496c550927a7a19ca3b6d
registry.aliyuncs.com/k8sxio/kube-apiserver:v1.22.2
registry.aliyuncs.com/k8sxio/kube-apiserver@sha256:eb4fae890583e8d4449c1e18b097aec5574c25c8f0323369a2df871ffa146f41
registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.22.2
registry.aliyuncs.com/k8sxio/kube-controller-manager@sha256:91ccb477199cdb4c63fb0c8fcc39517a186505daf4ed52229904e6f9d09fd6f9
registry.aliyuncs.com/k8sxio/kube-proxy:v1.22.2
registry.aliyuncs.com/k8sxio/kube-proxy@sha256:561d6cb95c32333db13ea847396167e903d97cf6e08dd937906c3dd0108580b7
registry.aliyuncs.com/k8sxio/kube-scheduler:v1.22.2
registry.aliyuncs.com/k8sxio/kube-scheduler@sha256:c76cb73debd5e37fe7ad42cea9a67e0bfdd51dd56be7b90bdc50dd1bc03c018b
registry.aliyuncs.com/k8sxio/pause:3.5
registry.aliyuncs.com/k8sxio/pause@sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07
sha256:0048118155842e4c91f0498dd298b8e93dc3aecc7052d9882b76f48e311a76ba
sha256:5425bcbd23c54270d9de028c09634f8e9a014e9351387160c133ccf3a53ab3dc
sha256:873127efbc8a791d06e85271d9a2ec4c5d58afdf612d490e24fb3ec68e891c8d
sha256:8d147537fb7d1ac8895da4d55a5e53621949981e2e6460976dae812f83d84a44
sha256:b51ddc1014b04295e85be898dac2cd4c053433bfe7e702d7e9d6008f3779609b
sha256:e64579b7d8862eff8418d27bf67011e348a5d926fa80494a6475b3dc959777f5
sha256:ed210e3e4a5bae1237f1bb44d72a05a2f1e5c6bfe7a7e73da179e2534269c459
[root@master1 ~]#


[root@master1 ~]#ctr -n k8s.io i tag docker.io/coredns/coredns:1.8.4 registry.aliyuncs.com/k8sxio/coredns:v1.8.4
registry.aliyuncs.com/k8sxio/coredns:v1.8.4
[root@master1 ~]#

然后就可以使用上面的配置文件在 master 节点上进行初始化：

#汇总命令如下
ctr -n k8s.io i pull registry.aliyuncs.com/k8sxio/pause:3.5
ctr -n k8s.io i tag registry.aliyuncs.com/k8sxio/pause:3.5 k8s.gcr.io/pause:3.5

测试过程如下：（这里应该算是一个bug或者问题吧，因为之前我们不是在containerd里修改好pause镜像配置了嘛，但是这里依然有问题，我们手动拉取下镜像就好）

这里需要特别注意下：会报错。。。

#注意：可以通过加上--v 5来进一步打印更多的log信息
kubeadm init --config kubeadm.yaml --v 5

[root@master1 ~]#kubeadm init --config kubeadm.yaml
W1031 07:14:21.837059   26278 strict.go:55] error unmarshaling configuration schema.GroupVersionKind{Group:"kubelet.config.k8s.io", V                                    ersion:"v1beta1", Kind:"KubeletConfiguration"}: error converting YAML to JSON: yaml: unmarshal errors:
  line 27: key "cgroupDriver" already set in map
[init] Using Kubernetes version: v1.22.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.c                                    luster.local master1] and IPs [10.96.0.1 172.29.9.51]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master1] and IPs [172.29.9.51 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master1] and IPs [172.29.9.51 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests".                                     This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

        Unfortunately, an error has occurred:
                timed out waiting for the condition

        This error is likely caused by:
                - The kubelet is not running
                - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

        If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
                - 'systemctl status kubelet'
                - 'journalctl -xeu kubelet'

        Additionally, a control plane component may have crashed or exited when started by the container runtime.
        To troubleshoot, list all containers using your preferred container runtimes CLI.

        Here is one example how you may list all Kubernetes containers running in cri-o/containerd using crictl:
                - 'crictl --runtime-endpoint /run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
                Once you have found the failing container, you can inspect its logs with:
                - 'crictl --runtime-endpoint /run/containerd/containerd.sock logs CONTAINERID'

error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
[root@master1 ~]#

我们进一步排查报错log：

[root@master1 ~]#systemctl status kubelet

[root@master1 ~]#journalctl -xeu kubelet

[root@master1 ~]#vim /var/log/messages

通过上述排查，从vim /var/log/messages可以看出是error="failed to get sandbox image \"k8s.gcr.io/pause:3.5\"问题。

咦，奇怪了，不是本地都已经下载好了阿里云pause镜像了吗，这里怎么提示还从默认k8s仓库拉pause镜像呢？

这里我们再根据报错提示再次从k8s官方仓库拉取下pause镜像，看下效果：

[root@master1 ~]#ctr -n k8s.io i pull k8s.gcr.io/pause:3.5

尝试了几次，发现从k8s官方仓库拉取pause镜像一直失败，即使科学上网也还是有问题。

咦，我们不是可以直接把阿里云仓库下载的镜像直接打下tag不就可以了吗，下面测试下：

[root@master1 ~]#ctr -n k8s.io i tag registry.aliyuncs.com/k8sxio/pause:3.5 k8s.gcr.io/pause:3.5
[root@master1 ~]#ctr -n k8s.io i ls -q

此时，我们用kubeadm reset命令清空下刚才master1节点，再次初始化集群看下下效果：

[root@master1 ~]#kubeadm reset

[root@master1 ~]#kubeadm init --config kubeadm.yaml
W1031 07:56:49.681727   27288 strict.go:55] error unmarshaling configuration schema.GroupVersionKind{Group:"kubelet.config.k8s.io", Version:"v1beta1", Kind:"KubeletConfiguration"}: error converting YAML to JSON: yaml: unmarshal errors:
  line 27: key "cgroupDriver" already set in map
[init] Using Kubernetes version: v1.22.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local master1] and IPs [10.96.0.1 172.29.9.51]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost master1] and IPs [172.29.9.51 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost master1] and IPs [172.29.9.51 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[apiclient] All control plane components are healthy after 210.014030 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.22" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node master1 as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: abcdef.0123456789abcdef
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.29.9.51:6443 --token abcdef.0123456789abcdef \
        --discovery-token-ca-cert-hash sha256:7fb11aea8a467bd1453efe10600c167b87a5f04d55d7e60298583a6a0c736ec4
[root@master1 ~]#

注意：

I1030 07:26:13.898398 18436 checks.go:205] validating availability of port 10250 kubelet端口
I1030 07:26:13.898547 18436 checks.go:205] validating availability of port 2379 etcd端口
I1030 07:26:13.898590 18436 checks.go:205] validating availability of port 2380 etcd端口

master1节点初始化成功。

根据安装提示拷贝 kubeconfig 文件：

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

然后可以使用 kubectl 命令查看 master 节点已经初始化成功了：

[root@master1 ~]#kubectl get node
NAME      STATUS   ROLES                  AGE    VERSION
master1   Ready    control-plane,master   114s   v1.22.2
[root@master1 ~]#

3.4添加节点

⚠️ node1和node2节点操作

记住初始化集群上面的配置和操作要提前做好，将 master 节点上面的 $HOME/.kube/config 文件拷贝到 node 节点对应的文件中，安装 kubeadm、kubelet、kubectl（可选），然后执行上面初始化完成后提示的 join 命令即可：

kubeadm join 172.29.9.51:6443 --token abcdef.0123456789abcdef \
        --discovery-token-ca-cert-hash sha256:7fb11aea8a467bd1453efe10600c167b87a5f04d55d7e60298583a6a0c736ec4

join 命令：如果忘记了上面的 join 命令可以使用命令 kubeadm token create --print-join-command 重新获取。
执行成功后运行 get nodes 命令：

[root@master1 ~]#kubectl get node
NAME      STATUS   ROLES                  AGE    VERSION
master1   Ready    control-plane,master   31m    v1.22.2
node1     Ready    <none>                 102s   v1.22.2
node2     Ready    <none>                 95s    v1.22.2
[root@master1 ~]#

3.5安装网络插件flannel

⚠️ master1节点操作（但是node1和node2上需要手动拉取下pause镜像）

这个时候其实集群还不能正常使用，因为还没有安装网络插件，接下来安装网络插件，可以在文档 https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/ 中选择我们自己的网络插件，这里我们安装 flannel:

➜  ~ wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

注意：

# 如果有节点是多网卡，则需要在资源清单文件中指定内网网卡
# 搜索到名为 kube-flannel-ds 的 DaemonSet，在kube-flannel容器下面
➜  ~ vi kube-flannel.yml
......
containers:
- name: kube-flannel
  image: quay.io/coreos/flannel:v0.15.0
  command:
  - /opt/bin/flanneld
  args:
  - --ip-masq
  - --kube-subnet-mgr
  - --iface=eth0  # 如果是多网卡的话，指定内网网卡的名称
......

注意：这个文件需要科学上网才可以，你们可以直接用我这里的下载好的文件kube-flannel.yml，不需要改动啥，直接apply就好

[root@master1 ~]#ll -h
total 122M
-rw-r--r-- 1 root root 122M Jul 30 01:16 cri-containerd-cni-1.5.5-linux-amd64.tar.gz
-rw-r--r-- 1 root root 2.0K Oct 31 06:54 kubeadm.yaml
-rw-r--r-- 1 root root 7.4K Oct 31 23:01 kube-dashboard.yaml
-rw-r--r-- 1 root root 5.1K Oct 31 08:38 kube-flannel.yml
[root@master1 ~]#

    https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

[root@master1 ~]#vim kube-flannel.yml
---
apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: psp.flannel.unprivileged
  annotations:
    seccomp.security.alpha.kubernetes.io/allowedProfileNames: docker/default
    seccomp.security.alpha.kubernetes.io/defaultProfileName: docker/default
    apparmor.security.beta.kubernetes.io/allowedProfileNames: runtime/default
    apparmor.security.beta.kubernetes.io/defaultProfileName: runtime/default
spec:
  privileged: false
  volumes:
  - configMap
  - secret
  - emptyDir
  - hostPath
  allowedHostPaths:
  - pathPrefix: "/etc/cni/net.d"
  - pathPrefix: "/etc/kube-flannel"
  - pathPrefix: "/run/flannel"
  readOnlyRootFilesystem: false
  # Users and groups
  runAsUser:
    rule: RunAsAny
  supplementalGroups:
    rule: RunAsAny
  fsGroup:
    rule: RunAsAny
  # Privilege Escalation
  allowPrivilegeEscalation: false
  defaultAllowPrivilegeEscalation: false
  # Capabilities
  allowedCapabilities: ['NET_ADMIN', 'NET_RAW']
  defaultAddCapabilities: []
  requiredDropCapabilities: []
  # Host namespaces
  hostPID: false
  hostIPC: false
  hostNetwork: true
  hostPorts:
  - min: 0
    max: 65535
  # SELinux
  seLinux:
    # SELinux is unused in CaaSP
    rule: 'RunAsAny'
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
rules:
- apiGroups: ['extensions']
  resources: ['podsecuritypolicies']
  verbs: ['use']
  resourceNames: ['psp.flannel.unprivileged']
- apiGroups:
  - ""
  resources:
  - pods
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - nodes
  verbs:
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - nodes/status
  verbs:
  - patch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: flannel
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: flannel
subjects:
- kind: ServiceAccount
  name: flannel
  namespace: kube-system
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: flannel
  namespace: kube-system
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: kube-flannel-cfg
  namespace: kube-system
  labels:
    tier: node
    app: flannel
data:
  cni-conf.json: |
    {
      "name": "cbr0",
      "cniVersion": "0.3.1",
      "plugins": [
        {
          "type": "flannel",
          "delegate": {
            "hairpinMode": true,
            "isDefaultGateway": true
          }
        },
        {
          "type": "portmap",
          "capabilities": {
            "portMappings": true
          }
        }
      ]
    }
  net-conf.json: |
    {
      "Network": "10.244.0.0/16",
      "Backend": {
        "Type": "vxlan"
      }
    }
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: kube-flannel-ds
  namespace: kube-system
  labels:
    tier: node
    app: flannel
spec:
  selector:
    matchLabels:
      app: flannel
  template:
    metadata:
      labels:
        tier: node
        app: flannel
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: kubernetes.io/os
                operator: In
                values:
                - linux
      hostNetwork: true
      priorityClassName: system-node-critical
      tolerations:
      - operator: Exists
        effect: NoSchedule
      serviceAccountName: flannel
      initContainers:
      - name: install-cni-plugin
        image: rancher/mirrored-flannelcni-flannel-cni-plugin:v1.2
        command:
        - cp
        args:
        - -f
        - /flannel
        - /opt/cni/bin/flannel
        volumeMounts:
        - name: cni-plugin
          mountPath: /opt/cni/bin
      - name: install-cni
        image: quay.io/coreos/flannel:v0.15.0
        command:
        - cp
        args:
        - -f
        - /etc/kube-flannel/cni-conf.json
        - /etc/cni/net.d/10-flannel.conflist
        volumeMounts:
        - name: cni
          mountPath: /etc/cni/net.d
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      containers:
      - name: kube-flannel
        image: quay.io/coreos/flannel:v0.15.0
        command:
        - /opt/bin/flanneld
        args:
        - --ip-masq
        - --kube-subnet-mgr
        resources:
          requests:
            cpu: "100m"
            memory: "50Mi"
          limits:
            cpu: "100m"
            memory: "50Mi"
        securityContext:
          privileged: false
          capabilities:
            add: ["NET_ADMIN", "NET_RAW"]
        env:
        - name: POD_NAME
          valueFrom:
            fieldRef:
              fieldPath: metadata.name
        - name: POD_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        volumeMounts:
        - name: run
          mountPath: /run/flannel
        - name: flannel-cfg
          mountPath: /etc/kube-flannel/
      volumes:
      - name: run
        hostPath:
          path: /run/flannel
      - name: cni-plugin
        hostPath:
          path: /opt/cni/bin
      - name: cni
        hostPath:
          path: /etc/cni/net.d
      - name: flannel-cfg
        configMap:
          name: kube-flannel-cfg

这个文件不用修改什么，直接apply即可：

 [root@master1 ~]#kubectl apply -f kube-flannel.yml # 安装 flannel 网络插件
Warning: policy/v1beta1 PodSecurityPolicy is deprecated in v1.21+, unavailable in v1.25+
podsecuritypolicy.policy/psp.flannel.unprivileged created
clusterrole.rbac.authorization.k8s.io/flannel created
clusterrolebinding.rbac.authorization.k8s.io/flannel created
serviceaccount/flannel created
configmap/kube-flannel-cfg created
daemonset.apps/kube-flannel-ds created
[root@master1 ~]#

隔一会儿查看 Pod 运行状态：

[root@master1 ~]#kubectl get pod -nkube-system -owide
NAME                              READY   STATUS              RESTARTS      AGE   IP            NODE      NOMINATED NODE   READINESS GATES
coredns-7568f67dbd-j5q55          1/1     Running             0             12h   10.88.0.3     master1   <none>           <none>
coredns-7568f67dbd-mpb4g          1/1     Running             0             12h   10.88.0.2     master1   <none>           <none>
etcd-master1                      1/1     Running             1             12h   172.29.9.51   master1   <none>           <none>
kube-apiserver-master1            1/1     Running             2 (12h ago)   12h   172.29.9.51   master1   <none>           <none>
kube-controller-manager-master1   1/1     Running             1             12h   172.29.9.51   master1   <none>           <none>
kube-flannel-ds-gprz7             1/1     Running             0             11h   172.29.9.51   master1   <none>           <none>
kube-flannel-ds-h9pw6             0/1     Init:0/2            0             11h   172.29.9.52   node1     <none>           <none>
kube-flannel-ds-pct7r             0/1     Init:0/2            0             11h   172.29.9.53   node2     <none>           <none>
kube-proxy-4sw76                  1/1     Running             0             12h   172.29.9.51   master1   <none>           <none>
kube-proxy-mkghd                  0/1     ContainerCreating   0             11h   172.29.9.52   node1     <none>           <none>
kube-proxy-s2748                  0/1     ContainerCreating   0             11h   172.29.9.53   node2     <none>           <none>
kube-scheduler-master1            1/1     Running             1             12h   172.29.9.51   master1   <none>           <none>
[root@master1 ~]#

这时候遇到了个问题：

可以看出：node1和node2的 kube-flannel、kube-proxy pod均没有启动成功！

这些组件都是以daemonset方式启动的，即每一个节点均会起一个pod；

我们通过kubectl describe pod xxx命令来看下这些pod部署失败的原因：

[root@master1 ~]#kubectl describe pod kube-flannel-ds-h9pw6 -nkube-system
[root@master1 ~]#kubectl describe pod kube-proxy-s2748  -nkube-system

经查看发现导致node1、node2节点部署kube-flannel、kube-proxy pod失败的原因是没有k8s.gcr.io/pause:3.5镜像，这个应该和在master节点遇到的现象一样。

接下来我们在node1节点上测试下：

我们先来看下本地存在哪些镜像？

[root@node1 ~]#ctr -n k8s.io i ls -q
[root@node1 ~]#ctr  i ls -q
[root@node1 ~]#

查看发现本地无任何镜像。

根据报错提示，我们手动拉取下k8s.gcr.io/pause:3.5镜像：

[root@node1 ~]#ctr -n k8s.io i pull k8s.gcr.io/pause:3.5
k8s.gcr.io/pause:3.5: resolving      |--------------------------------------|
elapsed: 20.9s        total:   0.0 B (0.0 B/s)
INFO[0021] trying next host                              error="failed to do request: Head \"https://k8s.gcr.io/v2/pause/manifests/3.5\": dial tcp 64.233.188.82:443: connect: connection refused" host=k8s.gcr.io
ctr: failed to resolve reference "k8s.gcr.io/pause:3.5": failed to do request: Head "https://k8s.gcr.io/v2/pause/manifests/3.5": dial tcp 64.233.188.82:443: connect: connection refused
[root@node1 ~]#

结果报错，看这个报错是这个镜像无法拉取。

此时，我们采用阿里源拉取下这个镜像，然后再重新打下tag，和master1节点一样：

#汇总命令如下
ctr -n k8s.io i pull registry.aliyuncs.com/k8sxio/pause:3.5
ctr -n k8s.io i tag registry.aliyuncs.com/k8sxio/pause:3.5 k8s.gcr.io/pause:3.5

#测试过程如下
[root@node1 ~]#ctr -n k8s.io i pull registry.aliyuncs.com/k8sxio/pause:3.5
registry.aliyuncs.com/k8sxio/pause:3.5:                                           resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:369201a612f7b2b585a8e6ca99f77a36bcdbd032463d815388a96800b63ef2c8: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:019d8da33d911d9baabe58ad63dea2107ed15115cca0fc27fc0f627e82a695c1:    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:ed210e3e4a5bae1237f1bb44d72a05a2f1e5c6bfe7a7e73da179e2534269c459:   done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 1.0 s                                                                    total:  4.8 Ki (4.8 KiB/s)
unpacking linux/amd64 sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07...
done: 48.82759ms
[root@node1 ~]#ctr -n k8s.io i tag registry.aliyuncs.com/k8sxio/pause:3.5 k8s.gcr.io/pause:3.5
k8s.gcr.io/pause:3.5
[root@node1 ~]#ctr -n k8s.io i ls -q
k8s.gcr.io/pause:3.5
registry.aliyuncs.com/k8sxio/pause:3.5
sha256:ed210e3e4a5bae1237f1bb44d72a05a2f1e5c6bfe7a7e73da179e2534269c459
[root@node1 ~]#

重新打好tag后，我们再次回到master1节点观察下pod启动情况：

现在发现node1节点的kube-flannel、kube-peoxy pod都启动完成了，就剩node2的了，那就是这个问题了。

我们按照相同方法在node2节点上进行相同操作：

#汇总命令如下
ctr -n k8s.io i pull registry.aliyuncs.com/k8sxio/pause:3.5
ctr -n k8s.io i tag registry.aliyuncs.com/k8sxio/pause:3.5 k8s.gcr.io/pause:3.5

#实际配置过程
[root@node2 ~]# ctr -n k8s.io i pull registry.aliyuncs.com/k8sxio/pause:3.5
registry.aliyuncs.com/k8sxio/pause:3.5:                                           resolved       |++++++++++++++++++++++++++++++++++++++|
index-sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07:    done           |++++++++++++++++++++++++++++++++++++++|
manifest-sha256:369201a612f7b2b585a8e6ca99f77a36bcdbd032463d815388a96800b63ef2c8: done           |++++++++++++++++++++++++++++++++++++++|
layer-sha256:019d8da33d911d9baabe58ad63dea2107ed15115cca0fc27fc0f627e82a695c1:    done           |++++++++++++++++++++++++++++++++++++++|
config-sha256:ed210e3e4a5bae1237f1bb44d72a05a2f1e5c6bfe7a7e73da179e2534269c459:   done           |++++++++++++++++++++++++++++++++++++++|
elapsed: 0.9 s                                                                    total:  4.8 Ki (5.3 KiB/s)
unpacking linux/amd64 sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07...
done: 53.021995ms
[root@node2 ~]#ctr -n k8s.io i tag registry.aliyuncs.com/k8sxio/pause:3.5 k8s.gcr.io/pause:3.5
k8s.gcr.io/pause:3.5
[root@node2 ~]#ctr -n k8s.io i ls -q
k8s.gcr.io/pause:3.5
registry.aliyuncs.com/k8sxio/pause:3.5
sha256:ed210e3e4a5bae1237f1bb44d72a05a2f1e5c6bfe7a7e73da179e2534269c459
[root@node2 ~]#

最终，在master1节点上再次确认下pod启动情况：

[root@master1 ~]#kubectl get po -A -owide
NAMESPACE     NAME                              READY   STATUS    RESTARTS      AGE   IP            NODE      NOMINATED NODE   READINESS GATES
kube-system   coredns-7568f67dbd-j5q55          1/1     Running   0             12h   10.88.0.3     master1   <none>           <none>
kube-system   coredns-7568f67dbd-mpb4g          1/1     Running   0             12h   10.88.0.2     master1   <none>           <none>
kube-system   etcd-master1                      1/1     Running   1             12h   172.29.9.51   master1   <none>           <none>
kube-system   kube-apiserver-master1            1/1     Running   2 (12h ago)   12h   172.29.9.51   master1   <none>           <none>
kube-system   kube-controller-manager-master1   1/1     Running   1             12h   172.29.9.51   master1   <none>           <none>
kube-system   kube-flannel-ds-gprz7             1/1     Running   0             11h   172.29.9.51   master1   <none>           <none>
kube-system   kube-flannel-ds-h9pw6             1/1     Running   0             11h   172.29.9.52   node1     <none>           <none>
kube-system   kube-flannel-ds-pct7r             1/1     Running   0             11h   172.29.9.53   node2     <none>           <none>
kube-system   kube-proxy-4sw76                  1/1     Running   0             12h   172.29.9.51   master1   <none>           <none>
kube-system   kube-proxy-mkghd                  1/1     Running   0             12h   172.29.9.52   node1     <none>           <none>
kube-system   kube-proxy-s2748                  1/1     Running   0             12h   172.29.9.53   node2     <none>           <none>
kube-system   kube-scheduler-master1            1/1     Running   1             12h   172.29.9.51   master1   <none>           <none>
[root@master1 ~]#

目前，一切正常。

注意：Flannel 网络插件

当我们部署完网络插件后执行 ifconfig 命令，正常会看到新增的cni0与flannel1这两个虚拟设备，但是如果没有看到cni0这个设备也不用太担心，我们可以观察/var/lib/cni目录是否存在，如果不存在并不是说部署有问题，而是该节点上暂时还没有应用运行，我们只需要在该节点上运行一个 Pod 就可以看到该目录会被创建，并且cni0设备也会被创建出来。用同样的方法添加另外一个节点即可。

4、Dashboard

v1.22.2 版本的集群需要安装最新的 2.0+ 版本的 Dashboard：

⚠️ master1节点操作

4.1下载kube-dashboard的yaml文件

# 推荐使用下面这种方式
[root@master1 ~]#wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.3.1/aio/deploy/recommended.yaml
--2021-10-31 22:56:47--  https://raw.githubusercontent.com/kubernetes/dashboard/v2.3.1/aio/deploy/recommended.yaml
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.110.133, 185.199.111.133, 185.199.108.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.110.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7552 (7.4K) [text/plain]
Saving to: ‘recommended.yaml’

100%[===============================================================================================================================>] 7,552       3.76KB/s   in 2.0s

2021-10-31 22:56:55 (3.76 KB/s) - ‘recommended.yaml’ saved [7552/7552]

[root@master1 ~]#ll
total 124120
-rw-r--r-- 1 root root 127074581 Jul 30 01:16 cri-containerd-cni-1.5.5-linux-amd64.tar.gz
-rw-r--r-- 1 root root      1976 Oct 31 06:54 kubeadm.yaml
-rw-r--r-- 1 root root      5175 Oct 31 08:38 kube-flannel.yml
-rw-r--r-- 1 root root      7552 Oct 31 22:56 recommended.yaml


[root@master1 ~]#mv recommended.yaml kube-dashboard.yaml

4.2修改kube-dashboard.yaml文件

[root@master1 ~]#vim kube-dashboard.yaml
# 修改Service为NodePort类型
......
kind: Service
apiVersion: v1
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kubernetes-dashboard
spec:
  ports:
    - port: 443
      targetPort: 8443
  selector:
    k8s-app: kubernetes-dashboard
  type: NodePort  # 加上type=NodePort变成NodePort类型的服务
......

4.3部署kube-dashboard.yaml文件

在 YAML 文件中可以看到新版本 Dashboard 集成了一个 metrics-scraper 的组件，可以通过 Kubernetes 的 Metrics API 收集一些基础资源的监控信息，并在 web 页面上展示，所以要想在页面上展示监控信息就需要提供 Metrics API，比如安装 Metrics Server。

直接创建：

[root@master1 ~]#kubectl apply -f kube-dashboard.yaml
namespace/kubernetes-dashboard created
serviceaccount/kubernetes-dashboard created
service/kubernetes-dashboard created
secret/kubernetes-dashboard-certs created
secret/kubernetes-dashboard-csrf created
secret/kubernetes-dashboard-key-holder created
configmap/kubernetes-dashboard-settings created
role.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrole.rbac.authorization.k8s.io/kubernetes-dashboard created
rolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
clusterrolebinding.rbac.authorization.k8s.io/kubernetes-dashboard created
deployment.apps/kubernetes-dashboard created
service/dashboard-metrics-scraper created
Warning: spec.template.metadata.annotations[seccomp.security.alpha.kubernetes.io/pod]: deprecated since v1.19; use the "seccompProfile" field instead
deployment.apps/dashboard-metrics-scraper created
[root@master1 ~]#

新版本的 Dashboard 会被默认安装在 kubernetes-dashboard 这个命名空间下面：

[root@master1 ~]#kubectl get pod -A
NAMESPACE              NAME                                         READY   STATUS    RESTARTS      AGE
kube-system            coredns-7568f67dbd-559d2                     1/1     Running   0             14m
kube-system            coredns-7568f67dbd-b95fg                     1/1     Running   0             14m
kube-system            etcd-master1                                 1/1     Running   1             15h
kube-system            kube-apiserver-master1                       1/1     Running   2 (15h ago)   15h
kube-system            kube-controller-manager-master1              1/1     Running   1             15h
kube-system            kube-flannel-ds-gprz7                        1/1     Running   0             14h
kube-system            kube-flannel-ds-h9pw6                        1/1     Running   0             14h
kube-system            kube-flannel-ds-pct7r                        1/1     Running   0             14h
kube-system            kube-proxy-4sw76                             1/1     Running   0             15h
kube-system            kube-proxy-mkghd                             1/1     Running   0             14h
kube-system            kube-proxy-s2748                             1/1     Running   0             14h
kube-system            kube-scheduler-master1                       1/1     Running   1             15h
kubernetes-dashboard   dashboard-metrics-scraper-856586f554-zwv6r   1/1     Running   0             34s
kubernetes-dashboard   kubernetes-dashboard-67484c44f6-hh5zj        1/1     Running   0             34s
[root@master1 ~]#

4.4配置cni

⚠️ 注意：这里需要在node1和node2上操作。

我们仔细看可以发现上面的 Pod 分配的 IP 段是 10.88.xx.xx，包括前面自动安装的 CoreDNS 也是如此，我们前面不是配置的 podSubnet 为 10.244.0.0/16 吗？

我们先去查看下 CNI 的配置文件：

[root@node1 ~]#ll /etc/cni/net.d/
total 8
-rw-r--r-- 1 1001  116 604 Jul 30 01:13 10-containerd-net.conflist
-rw-r--r-- 1 root root 292 Oct 31 20:30 10-flannel.conflist
[root@node1 ~]#

可以看到里面包含两个配置，一个是 10-containerd-net.conflist，另外一个是我们上面创建的 Flannel 网络插件生成的配置，我们的需求肯定是想使用 Flannel 的这个配置，我们可以查看下 containerd 这个自带的 cni 插件配置：

[root@node1 net.d]#cat 10-containerd-net.conflist
{
  "cniVersion": "0.4.0",
  "name": "containerd-net",
  "plugins": [
    {
      "type": "bridge",
      "bridge": "cni0",
      "isGateway": true,
      "ipMasq": true,
      "promiscMode": true,
      "ipam": {
        "type": "host-local",
        "ranges": [
          [{
            "subnet": "10.88.0.0/16"
          }],
          [{
            "subnet": "2001:4860:4860::/64"
          }]
        ],
        "routes": [
          { "dst": "0.0.0.0/0" },
          { "dst": "::/0" }
        ]
      }
    },
    {
      "type": "portmap",
      "capabilities": {"portMappings": true}
    }
  ]
}
[root@node1 net.d]#

可以看到上面的 IP 段恰好就是 10.88.0.0/16，但是这个 cni 插件类型是 bridge 网络，网桥的名称为 cni0：

但是使用 bridge 网络的容器无法跨多个宿主机进行通信，跨主机通信需要借助其他的 cni 插件，比如上面我们安装的 Flannel，或者 Calico 等等。

由于我们这里有两个 cni 配置，所以我们需要将 10-containerd-net.conflist 这个配置删除，因为如果这个目录中有多个 cni 配置文件，kubelet 将会使用按文件名的字典顺序排列的第一个作为配置文件，所以前面默认选择使用的是 containerd-net 这个插件。

在node1上操作：

[root@node1 ~]#mv /etc/cni/net.d/10-containerd-net.conflist{,.bak}
[root@node1 ~]#ifconfig  cni0 down && ip link delete cni0
[root@node1 ~]#systemctl daemon-reload
[root@node1 ~]#systemctl restart containerd kubelet

#命令汇总
mv /etc/cni/net.d/10-containerd-net.conflist{,.bak}
ifconfig  cni0 down && ip link delete cni0
systemctl daemon-reload
systemctl restart containerd kubelet

按删除方法把node2节点也配置下：

[root@node2 ~]#mv /etc/cni/net.d/10-containerd-net.conflist{,.bak}
[root@node2 ~]#ifconfig  cni0 down && ip link delete cni0
[root@node2 ~]#systemctl daemon-reload
[root@node2 ~]#systemctl restart containerd kubelet

然后记得重建 coredns 和 dashboard 的 Pod，重建后 Pod 的 IP 地址就正常了：

[root@master1 ~]#kubectl delete pod coredns-7568f67dbd-ftnlb coredns-7568f67dbd-scdb5 -nkube-system
pod "coredns-7568f67dbd-ftnlb" deleted
pod "coredns-7568f67dbd-scdb5" deleted
[root@master1 ~]#kubectl delete pod dashboard-metrics-scraper-856586f554-r5qvs kubernetes-dashboard-67484c44f6-9dg85 -nkubernetes-dashboard
pod "dashboard-metrics-scraper-856586f554-r5qvs" deleted
pod "kubernetes-dashboard-67484c44f6-9dg85" deleted

4.5登录kube-dashboard

查看 Dashboard 的 NodePort 端口：

[root@master1 ~]#kubectl get svc -A
NAMESPACE              NAME                        TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)                  AGE
default                kubernetes                  ClusterIP   10.96.0.1        <none>        443/TCP                  15h
kube-system            kube-dns                    ClusterIP   10.96.0.10       <none>        53/UDP,53/TCP,9153/TCP   15h
kubernetes-dashboard   dashboard-metrics-scraper   ClusterIP   10.99.185.146    <none>        8000/TCP                 18m
kubernetes-dashboard   kubernetes-dashboard        NodePort    10.105.110.130   <none>        443:30498/TCP            18m
[root@master1 ~]#

然后可以通过上面的 31498端口去访问 Dashboard，要记住使用 https，Chrome 不生效可以使用Firefox 测试，如果没有 Firefox 下面打不开页面，可以点击下页面中的信任证书即可：

信任后就可以访问到 Dashboard 的登录页面了：

然后创建一个具有全局所有权限的用户来登录 Dashboard：(admin.yaml)

#admin.yaml 这里可以直接使用我提供的admin.yaml文件就好
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: admin
roleRef:
  kind: ClusterRole
  name: cluster-admin
  apiGroup: rbac.authorization.k8s.io
subjects:
- kind: ServiceAccount
  name: admin
  namespace: kubernetes-dashboard
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: admin
  namespace: kubernetes-dashboard

直接创建：

[root@master1 ~]#kubectl apply -f admin.yaml
clusterrolebinding.rbac.authorization.k8s.io/admin created
serviceaccount/admin created
[root@master1 ~]#
[root@master1 ~]#kubectl get secret admin-token-sbh8v -o jsonpath={.data.token} -n kubernetes-dashboard |base64 -d
eyJhbGciOiJSUzI1NiIsImtpZCI6Im9hOERROHhuWTRSeVFweG1kdnpXSTRNNXg3bDZ0ZkRDVWNzN0l5Z0haT1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJhZG1pbi10b2tlbi1zYmg4diIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50Lm5hbWUiOiJhZG1pbiIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjYwMDVmZjc5LWJlYzItNDE5MC1iMmNmLWMwOGVhNDRmZTVmMCIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDprdWJlcm5ldGVzLWRhc2hib2FyZDphZG1pbiJ9.FyxLcGjtxI5Asl07Z21FzAFFcgwVFI7zc-2ITU6uQV3dzQaUNEl642MyIkhkEvdqd6He0eKgR79xm1ly9PL0exD2YUVFfeCnt-M-LiJwM59YGEQfQHYcku6ozikyQ7YeooV2bQ6EsAsyseVcCrHfa4ZXjmBb9L5rRX9Kds49yVsVFWbgbhe3LzNFMy1wm4fTN4EXPvegmA6mUaMLlLsrQJ2oNokx9TYuifIXjQDATZMTFc-YQMZahAbT-rUz8KccOt1O59NebitAHW0YKVFTkxbvBQwghe_yf25_j07LRbygSnHV5OrMEqZXl82AhdnXsvqdjjes6AxejGuDtwiiyw[root@master1 ~]#
# 会生成一串很长的base64后的字符串

然后用上面的 base64 解码后的字符串作为 token 登录 Dashboard 即可，新版本还新增了一个暗黑模式：

5、清理

如果你的集群安装过程中遇到了其他问题，我们可以使用下面的命令来进行重置：

➜  ~ kubeadm reset
➜  ~ ifconfig cni0 down && ip link delete cni0
➜  ~ ifconfig flannel.1 down && ip link delete flannel.1➜  
~ rm -rf /var/lib/cni/

最终我们就完成了使用 kubeadm 搭建 v1.22.2 版本的 kubernetes 集群、coredns、ipvs、flannel、containerd。

最后记得做下3台虚机的快照，方便后期还原。

FAQ

1.coredns镜像和pause镜像问题

01.coredns镜像拉取失败问题

报错现象：在拉取registry.aliyuncs.com/k8sxio/coredns:v1.8.4时报错，说阿里云的k8s.io仓库里没这个镜像。

因此只能先从官方拉取这个镜像，然后再打上tag即可：

ctr -n k8s.io i pull docker.io/coredns/coredns:1.8.4
ctr -n k8s.io i tag docker.io/coredns/coredns:1.8.4 registry.aliyuncs.com/k8sxio/coredns:v1.8.4

02.从阿里仓库k8sxio仓库拉取的pause镜像无法使用问题

从阿里云仓库下载的3个节点的pause镜像，k8s不识别，还依然去官网拉取pasue急需耐心那个，需要重新打成官网k8s.gcr.io/pause:3.5才可以。

可能是个bug问题，先搁置，或者使用最新版的v1.22.3去测试下。

[root@master1 ~]#systemctl cat kubelet
[root@master1 ~]#cat /var/lib/kubelet/kubeadm-flags.env

可以看到kubelet拉取镜像的仓库地址已经修改成了阿里云k8sxio仓库：

[root@master1 ~]#vim kubeadm.yaml

修改方法：

ctr -n k8s.io i pull registry.aliyuncs.com/k8sxio/pause:3.5
ctr -n k8s.io i tag registry.aliyuncs.com/k8sxio/pause:3.5 k8s.gcr.io/pause:3.5

[root@master1 ~]#ctr -n k8s.io i ls -q
docker.io/coredns/coredns:1.8.4
docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin:v1.2
docker.io/rancher/mirrored-flannelcni-flannel-cni-plugin@sha256:b69fb2dddf176edeb7617b176543f3f33d71482d5d425217f360eca5390911dc
k8s.gcr.io/pause:3.5
quay.io/coreos/flannel:v0.15.0
quay.io/coreos/flannel@sha256:bf24fa829f753d20b4e36c64cf9603120c6ffec9652834953551b3ea455c4630
registry.aliyuncs.com/k8sxio/coredns:v1.8.4
registry.aliyuncs.com/k8sxio/etcd:3.5.0-0
registry.aliyuncs.com/k8sxio/etcd@sha256:9ce33ba33d8e738a5b85ed50b5080ac746deceed4a7496c550927a7a19ca3b6d
registry.aliyuncs.com/k8sxio/kube-apiserver:v1.22.2
registry.aliyuncs.com/k8sxio/kube-apiserver@sha256:eb4fae890583e8d4449c1e18b097aec5574c25c8f0323369a2df871ffa146f41
registry.aliyuncs.com/k8sxio/kube-controller-manager:v1.22.2
registry.aliyuncs.com/k8sxio/kube-controller-manager@sha256:91ccb477199cdb4c63fb0c8fcc39517a186505daf4ed52229904e6f9d09fd6f9
registry.aliyuncs.com/k8sxio/kube-proxy:v1.22.2
registry.aliyuncs.com/k8sxio/kube-proxy@sha256:561d6cb95c32333db13ea847396167e903d97cf6e08dd937906c3dd0108580b7
registry.aliyuncs.com/k8sxio/kube-scheduler:v1.22.2
registry.aliyuncs.com/k8sxio/kube-scheduler@sha256:c76cb73debd5e37fe7ad42cea9a67e0bfdd51dd56be7b90bdc50dd1bc03c018b
registry.aliyuncs.com/k8sxio/pause:3.5
registry.aliyuncs.com/k8sxio/pause@sha256:1ff6c18fbef2045af6b9c16bf034cc421a29027b800e4f9b68ae9b1cb3e9ae07
sha256:0048118155842e4c91f0498dd298b8e93dc3aecc7052d9882b76f48e311a76ba
sha256:09b38f011a29c697679aa10918b7514e22136b50ceb6cf59d13151453fe8b7a0
sha256:5425bcbd23c54270d9de028c09634f8e9a014e9351387160c133ccf3a53ab3dc
sha256:873127efbc8a791d06e85271d9a2ec4c5d58afdf612d490e24fb3ec68e891c8d
sha256:8d147537fb7d1ac8895da4d55a5e53621949981e2e6460976dae812f83d84a44
sha256:98660e6e4c3ae49bf49cd640309f79626c302e1d8292e1971dcc2e6a6b7b8c4d
sha256:b51ddc1014b04295e85be898dac2cd4c053433bfe7e702d7e9d6008f3779609b
sha256:e64579b7d8862eff8418d27bf67011e348a5d926fa80494a6475b3dc959777f5
sha256:ed210e3e4a5bae1237f1bb44d72a05a2f1e5c6bfe7a7e73da179e2534269c459
[root@master1 ~]#

说明：2021年11月4日19:47:09更新

本次实验：infra容器镜像pause问题

1、查看kubelet中–pod-infra-container-image选项含义

[root@master1 ~]#kubelet --help|grep infra
      --pod-infra-container-image string                         Specified image will not be pruned by the image garbage collector. When container-runtime is set to 'docker', all containers in each pod will use the network/ipc namespaces from this image. Other CRI implementations have their own configuration to set this image. (default "k8s.gcr.io/pause:3.5")
#指定的镜像将不会被镜像垃圾收集器修剪。 当容器运行时设置为'docker'时，每个pod中的所有容器都将使用这个镜像中的network/ipc名称空间。 其他CRI实现有自己的配置来设置这个镜像。 (默认“k8s.gcr.io /pause:3.5”)

这个k8s.gcr.io/pause:3.5，每个pod启动之前，都会用这个镜像去启动一个容器。

默认是这个k8s.gcr.io/pause:3.5镜像，但是实际上是通过修改这个kubelet中这个–pod-infra-container-image参数来指定的。

2、我们来看一下目前这个参数的配置

[root@master1 ~]#systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sun 2021-10-31 23:12:58 CST; 3 days ago
     Docs: https://kubernetes.io/docs/
 Main PID: 95001 (kubelet)
    Tasks: 15
   Memory: 89.7M
   CGroup: /system.slice/kubelet.service
           └─95001 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/...

Nov 04 07:20:37 master1 kubelet[95001]: E1104 07:20:37.295297   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Nov 04 07:20:42 master1 kubelet[95001]: E1104 07:20:42.297382   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Nov 04 07:20:47 master1 kubelet[95001]: E1104 07:20:47.299431   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Nov 04 07:20:52 master1 kubelet[95001]: E1104 07:20:52.303127   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Nov 04 07:20:57 master1 kubelet[95001]: E1104 07:20:57.305007   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Nov 04 07:21:02 master1 kubelet[95001]: E1104 07:21:02.306356   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Nov 04 07:21:07 master1 kubelet[95001]: E1104 07:21:07.308434   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Nov 04 07:21:12 master1 kubelet[95001]: E1104 07:21:12.311120   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Nov 04 07:21:17 master1 kubelet[95001]: E1104 07:21:17.313708   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Nov 04 07:21:22 master1 kubelet[95001]: E1104 07:21:22.315374   95001 kubelet.go:2332] "Container runtime network not ready" networkReady="NetworkReady=fals...itialized"
Hint: Some lines were ellipsized, use -l to show in full.

[root@master1 ~]#cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS

#查看环境变量
[root@master1 ~]#cat /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--container-runtime=remote --container-runtime-endpoint=/run/containerd/containerd.sock --pod-infra-container-image=registry.aliyuncs.com/k8sxio/pause:3.5"
[root@master1 ~]#

可以看到–pod-infra-container-image=registry.aliyuncs.com/k8sxio/pause:3.5这个参数的值是阿里云镜像地址了，按理说，node节点这里如归指定了参数选项，就会默认会拉取这个镜像的，但实际上还是从默认地址k8s.gcr.io/pause:3.5拉取，这个是不正常的。

可能存在的原因是：

1.目前自己安装的版本是k8s v1.22.2版本，可能是这个版本存在bug问题；

2.升级一下；

3.回退一下版本或者更新一下版本；

4.老师去跟着一下源码；

这个不影响自己测试，但生产对环境还是影响很大的；

关于新版本是否会出现那个pause镜像 bug问题，自己还未测试，这里进行搁置！2021年11月2日22:12:00

注意：阳明总最后解决方法。

我们再来看一下这个参数：

[root@master1 ~]#kubelet --help|grep infra
      --pod-infra-container-image string                         Specified image will not be pruned by the image garbage collector. When container-runtime is set to 'docker', all containers in each pod will use the network/ipc namespaces from this image. Other CRI implementations have their own configuration to set this image. (default "k8s.gcr.io/pause:3.5")
[root@master1 ~]#

我们重启下kubelet服务再查看下报错：

[root@master1 ~]#tail -f /var/log/messages
[root@master1 ~]#systemctl restart kubelet

可以看到一条告警：对于远程容器运行时，kublet中的–pod-infr a-container-image参数将不生效，需要单独设置才行；

配置方法：

[root@master1 ~]#vim /etc/containerd/config.toml
……
sandbox_image = "registry.aliyuncs.com/k8sxio/pause:3.5"
……

重启containerd服务：

[root@master1 ~]#systemctl daemon-reload
[root@master1 ~]#systemctl restart containerd

至此，目前这个pause问题已解决！

2.有的yaml文件无法下载问题

因网络问题，kube-dashboard.yaml和kube-flannel.yml文件无法下载，大家可直接使用我提供的yaml文件即可。

[root@master1 ~]#ll
total 124124
-rw-r--r-- 1 root root      7569 Oct 31 23:01 kube-dashboard.yaml
-rw-r--r-- 1 root root      5175 Oct 31 08:38 kube-flannel.yml