使用Kubeadm快速部署Kubernetes

Kubernetes由google通过go语言基于borg（google内部自研发自用的资源管理器，根据当时K8s面世的时间，brog已在google内部运行长达10年以上）进行重写的容器资源管理器。由于K和s之间有8个字母，因此又被叫做K8S。文章目录安装部署环境准备服务器环境配置Master初始化**常见错误****节点加入集群阶段**本次署实例中用到的kubernetes常用包括如下服务功能

今天也要努力生活~

532人浏览 · 2021-11-21 13:31:44

今天也要努力生活~ · 2021-11-21 13:31:44 发布

Kubernetes由google通过go语言基于borg（google内部自研发自用的资源管理器，根据当时K8s面世的时间，brog已在google内部运行长达10年以上）进行重写的容器资源管理器。由于K和s之间有8个字母，因此又被叫做K8S。

本次署实例中用到的kubernetes常用包括如下

服务功能	命令	备注
创建资源	kubectl create kubectl create deployment web --image=nginx --replicas=3 kubectl create -f /path/to/deployment.yaml	● 基于命令，进行明确的创建动作，如果资源已经存在，由于资源名称在名称空间中是唯一的，因此再次执行则会报错。 ● 可使用-f参数后面能跟上文件的绝对路径让通过配置文件进行创建。 ● 除了deployment外，还支持创建Pod、Namespace、Node、Service、ReplicaSet等资源。
删除资源	kubectl delete 资源类型资源名称 kubectl delete deploy test-nginx kubectl delete pods pod1	● 可同时删除多个资源，通过”，“号分割。
暴露服务	kubectl expose deployment web --port=80 --target-port=80 --type=NodePort	● 将K8s中Pod的服务端口映射到Node以供外部访问。
查看Pod资源	kubectl get pods/po（简写） kubectl get pods -o wide	● 使用-o wide以获取更加详细的信息，例如Pod的IP地址，以及处在哪个物理Node上。 ● 也可以使用-o 指定其他格式，例如-o json、-o yaml。
查看服务	kubectl get services/svc（简写） kubectl get svc --all-namespaces kubectl get svc,pod --all-namespaces	● 查看当前存在的服务。 ● 配合-n NAMESPACES_NAME可在指定虚拟集群范围内查询。 ● 配合–all-namespaces或者-A可查询所有虚拟集群中的服务。 ● 支持同时查询服务和pod资源。
查看节点	kubectl get nodes/no（简写） kubectl get nodes -o wide kubectl get nodes -o wide -n kube-system	● 使用-o wide以获取更加详细的信息，例如Node的IP地址，以及操作系统镜像和内核版本。 ● 也可以使用-o 指定其他格式，例如-o json、-o yaml。 ● 加上-n kube-system 或者–namespace NAMESPACE_NAME可以查看kubernetes的基础组件的详细。
查看日志	kubectl logs pod kubectl logs kube-flannel-ds-npz55 -n kube-system（替换为实际的pod名称）	● 查看pod中的日志信息，如果pod中存在多个容器，可通过-c指定容器。 ● 查看某个pod资源时建议配合-n NAMESPACE_NAME以指定命名空间。
查看资源详细	kubectl describe pods kubectl describe pods POD_NAME -n kube-system	● 排错必备命令，如果某个POD或SVC出现了异常状态，可进行查看。 ● 可直接查看一类资源的详细也可以只查看某类中指定的资源。
节点加入集群	kubeadm join	● 后面需要接token 具体的命令内容会在初始化后直接给出，复制即可使用。
节点移除集群	kubectl drain	● 如果执行没有返回错误则成功，此时可以对该节点执行更改或关机，甚至是删除操作。

更多详细命令可见Kubernetes kubectl中文文档命令表。

安装部署

根据官网的介绍，目前已经支持在Ubuntu、CentOS等Linux的各种发行版及Windows、MacOS平台的安装部署。

支持的系统及版本包括 Ubuntu 16.04 / Debian9/CentOS 7/RHEL7/Fedora 25 / HypriotOS v1.0.1 等及以上版本。

环境准备

K8s为容器提供高可用性，本身以集群方式部署所需要的硬件配置也有一定要求，这次演示尝试最低的部署硬件配置详细如下表。

角色	名称及地址	操作系统及内核版本	硬件配置
Master	k8s-master 192.168.124.135	CentOS Linux release 7.4.1708 (Core) Linux 3.10.0-1160.24.1.el7.x86_64	CPU：2核内存：4G 硬盘：40G
Node	k8s-node01 192.168.124.153 k8s-node02 192.168.124.154 k8s-node03 192.168.124.155	CentOS Linux release 7.9.2009 (Core) Linux 3.10.0-1160.el7.x86_64	CPU：2核内存：2G 硬盘：40G

   实际配置网络时要求使用静态地址，需要相应的配置一下可用的DNS(否则后期部署时无法正常下载软件包而报错)，此次部署给各网卡配置的为114.114.114.114。
   Tips：Kubelet自1.8版本开始强制要求swap必须关闭。

服务器环境配置

（Master和Node节点都需要执行的操作）

关闭防火墙

[root@docker-master ~]# systemctl stop firewalld && systemctl disable firewalld && setenforce 0
[root@docker-master ~]# vi /etc/selinux/config
[root@docker-master ~]# grep SELINUX /etc/selinux/config
SELINUX=disabled

关闭交换分区

部署测试时未永久关闭，服务器发生重启时，kubelet服务会启动失败，需要再次关闭。

[root@docker-master ~]# swapoff -a

Tip:永久关闭方法，需要注释掉/etc/fstab文件中对交换内存的挂载。

[root@docker-master ~]# grep 'swap' /etc/fstab
#dev/mapper/centos-swap swap                    swap    defaults        0 0

添加主机映射

[root@docker-master ~]# cat >> /etc/hosts <<EOF
192.168.124.135 k8s-master
192.168.124.153 k8s-node01
192.168.124.154 k8s-node02
192.168.124.155 k8s-node03
EOF

内核参数调整

[root@k8s-master ~]# sysctl -w net.ipv4.ip_forward=1
[root@k8s-master ~]# cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
[root@k8s-master ~]# sysctl --system
* Applying /usr/lib/sysctl.d/00-system.conf ...
* Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ...
kernel.yama.ptrace_scope = 0
* Applying /usr/lib/sysctl.d/50-default.conf ...
kernel.sysrq = 16
kernel.core_uses_pid = 1
kernel.kptr_restrict = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /etc/sysctl.d/99-sysctl.conf ...
* Applying /etc/sysctl.d/k8s.conf ...
* Applying /etc/sysctl.conf ...

安装YUM仓库相关管理工具并添加docker源

[root@k8s-master ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
[root@k8s-master ~]# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

Docker安装

[root@k8s-master ~]# yum install -y docker-ce
启动并设置开机自启
[root@k8s-master ~]# systemctl start docker && systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.

创建国内的K8s镜像源

[root@k8s-master ~]# cat >>/etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

安装K8s组件

根据网上的资料，由于官网未开放同步方式, 可能会有索引gpg检查失败的情况，如果遇到了，可以用带–nogpgcheck参数的命令安装。

安装K8s组件
[root@k8s-master ~]# yum install -y kubelet kubeadm kubectl
或
[root@k8s-master ~]# yum install -y --nogpgcheck kubelet kubeadm kubectl
 
启动并设置开机自启
[root@k8s-master ~]# systemctl start kubelet && systemctl enable kubelet

Tips：通过YUM安装时若未指定软件版本则默认安装仓库内最新版本，后面初始化时需要对应管理工具的版本。
可通过yum list {software_name}来查询，以查找kubelet的信息为例

[root@k8s-master ~]# yum list kubectl --showduplicates | sort -r

然后在安装的时候直接指定软件版本

[root@k8s-master ~]# yum install -y kubelet-1.21.10-0 kubeadm-1.21.10-0 kubectl-1.21.10-0 --disableexcludes=kubernetes （禁止拉取除这个仓库之外的仓库数据）

Master初始化

(只在Master节点上操作)
kebeadm init 初始化命令相关参数说明：

● --kubernetes-version=v1.21.2 # 指定安装k8s版本，如果不指定默认使用最新版本，这里必须要与之前安装kubectl、kubeadm、kubelet的版本一致。
● --apiserver-advertise-address 192.168.124.135 #这里是apiserver的地址，也就master主机IP地址。
● --pod-network-cidr=10.200.0.0/16# 这个是后期创建pod时候使用IP地址段。
● --image-repository 这个是指定拉取镜像的源仓库地址，默认会从国外的网站拉取。
● --ignore-preflight-errors=all 这个是忽略在安装过程中遇到的报错，例如拉取镜像失败等。该项为可选，建议添加，保证流程顺利跑下去，过程中如果有报错信息依然会输出。

若初始化操作遇到了拉取镜像问题可参考如何从其它仓库拉取K8s集群初始化所需镜像

初始化

指定镜像仓库
[root@k8s-master k8s]# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.21.2 --apiserver-advertise-address=192.168.124.135 --pod-network-cidr=10.200.0.0/16 --ignore-preflight-errors=all
使用默认镜像仓库
[root@k8s-master k8s]# kubeadm init --kubernetes-version=v1.21.2 --apiserver-advertise-address=192.168.124.135 --pod-network-cidr=10.200.0.0/16 --ignore-preflight-errors=all
[init] Using Kubernetes version: v1.21.2
[preflight] Running pre-flight checks
        [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.124.135]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.124.135 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.124.135 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 10.005036 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8s-master as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 72g17q.e8w5h1vkrs1vd4go
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
 
Your Kubernetes control-plane has initialized successfully!
 
To start using your cluster, you need to run the following as a regular user:
 
  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config
 
Alternatively, if you are the root user, you can run:
 
  export KUBECONFIG=/etc/kubernetes/admin.conf
 
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/
 
Then you can join any number of worker nodes by running the following on each as root:
 
kubeadm join 192.168.124.135:6443 --token 72g17q.e8w5h1vkrs1vd4go \
        --discovery-token-ca-cert-hash sha256:18004cf996b2774851ab9a7e071e427598c8645ccb01d8ff057dab6ca9ae5701

最后的成功的输出如上，根据提示，下一步需要将kubectl工具使用的kubeconfig文件到默认调用目录下，这里可根据实际操作的账号身份为管理员或普通用户进行选择。

复制kubeconfig文件到默认调用位置

[root@k8s-master k8s]# mkdir -p $HOME/.kube
[root@k8s-master k8s]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@k8s-master k8s]# sudo chown $(id -u):$(id -g) $HOME/.kube/config

根据"You should now deploy a pod network to the cluster"提示，再下一步操作要求根据给出的方法安装容器网络插件，然后将节点加入Master组成集群。

网络插件有很多，官方给出了详细的介绍：https://kubernetes.io/zh/docs/concepts/cluster-administration/addons/

安装网络插件

本次选择的是Flannel

下载网络插件容器YML文件
[root@k8s-master k8s]# wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
执行网络插件容器构建
[root@k8s-master k8s]# kubectl apply -f kube-flannel.yml

节点Node加入集群

根据初始化后给出的token信息将其他节点加入集群

[root@k8s-node01 ~]# kubeadm join 192.168.124.135:6443 --token 72g17q.e8w5h1vkrs1vd4go \
--discovery-token-ca-cert-hash sha256:18004cf996b2774851ab9a7e071e427598c8645ccb01d8ff057dab6ca9ae5701
[preflight] Running pre-flight checks
    [WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the gue at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
 
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
 
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

查看节点状态及服务情况

[root@k8s-master ~]# kubectl get nodes
NAME         STATUS   ROLES                  AGE   VERSION
k8s-master   Ready    control-plane,master   31h   v1.21.2
k8s-node01   Ready    <none>                 30h   v1.21.2
k8s-node02   Ready    <none>                 30h   v1.21.2
k8s-node03   Ready    <none>                 30h   v1.21.2

查看kube-system组件服务状态

[root@k8s-master ~]# kubectl get pods -n kube-system
NAME                                 READY   STATUS    RESTARTS   AGE
coredns-558bd4d5db-7kxcw             1/1     Running   1          31h
coredns-558bd4d5db-z9vtc             1/1     Running   1          31h
etcd-k8s-master                      1/1     Running   1          31h
kube-apiserver-k8s-master            1/1     Running   1          31h
kube-controller-manager-k8s-master   1/1     Running   1          31h
kube-flannel-ds-r7hld                1/1     Running   12         30h
kube-flannel-ds-wj6gd                1/1     Running   12         30h
kube-flannel-ds-xc65s                1/1     Running   13         30h
kube-flannel-ds-xdjl5                1/1     Running   1          30h
kube-proxy-56mpr                     1/1     Running   1          30h
kube-proxy-czm9n                     1/1     Running   1          31h
kube-proxy-gm69f                     1/1     Running   1          30h
kube-proxy-p5dr9                     1/1     Running   1          30h
kube-scheduler-k8s-master            1/1     Running   1          31h

验证服务可用性

首先查看当前存在的Pod情况
[root@k8s-master k8s]# kubectl get pods
No resources found in default namespace.
 
通过Nginx镜像创建一个名为web的pod
[root@k8s-master ~]# kubectl create deployment web --image=nginx
 
将名称为web的Pod的服务端口暴露到宿主机以供外部访问
[root@k8s-master ~]# kubectl expose deployment web --port=80 --target-port=80 --type=NodePort
[root@k8s-master ~]# kubectl get pod
NAME                  READY   STATUS    RESTARTS   AGE
web-96d5df5c8-czkrq   1/1     Running   1          30h
 
查看当前的服务
[root@k8s-master ~]# kubectl get svc
NAME         TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
kubernetes   ClusterIP   10.96.0.1      <none>        443/TCP        31h
web          NodePort    10.98.61.137   <none>        80:31394/TCP   30h

此时已经可以打开浏览器访问节点的IP:指定的端口来进行访问。

kubeadm生成的token一般24小时后就过期，所以后面再有node加入集群，需要重新创建新的token，获取的命令如下：
可直接输入以下命令
kubeadm token create --print-join-command
也可按以下操作分步骤执行
1.重新生成新的token
kubeadm token create

2.查看生成的token
kubeadm token list

3.获取ca证书sha256编码hash值
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed ‘s/^.* //’

根据获取的内容，按照以下格式排列组合即完整命令
kubeadm join : --token --discovery-token-ca-cert-hash sha256:

常见错误

部署阶段

部署的时候如果DNS未配置或配置的有问题，获取Docker和K8S仓库源的时候会出错，提示Could not resolve host:
常见错误

遇到了需要升级contain-SElinux的情况，需要保证还有CentOS-Base.repo源，没有的话补上就行了。
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
常见错误

未设置/proc/sys/net/ipv4/ip_forward 文件参数等于1
常见错误

Master执行初始化过程时环境检查，kubelet服务未设置开机开机自启动
常见错误

由于安装K8s组件的时候未指定对应版本，所以默认安装的是YUM源内最新的，此处指定的工具版本比默认安装的kubelet版本更低，因此不支持。
常见错误

K8s初始化不指定镜像源，默认会去找国外的网站，需要网络翻墙，否则结果就是等待半个小时然后报错
常见错误

如果部署的过程中关闭了防火墙，但是未设置永久关闭，部署完成之后发生机器重启防火墙再开启可能会出现该报错。
常见错误

初始化阶段

在Master节点上做初始化的时候会出现一些告警提示，常见的告警及具体原因如下。

[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected “cgroupfs” as the Docker cgroup driver. The recommended driver is “systemd”. Please follow the guide at https://kubernetes.io/docs/setup/cri/

提示原因：

官方文档表示，更改设置，令容器运行时和kubelet使用systemd作为cgroup驱动，以此使系统更为稳定。请注意在docker下设置native.cgroupdriver=systemd选项。

方法一
通过编辑docker配置文件/etc/docker/daemon.json(有则改，无则加)

[root@k8s-master ~]# vim /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"]
}
 
[root@k8s-master ~]# systemctl daemon-reload
[root@k8s-master ~]# systemctl restart docker

方法二
编辑/usr/lib/systemd/system/docker.service，修改成如图下所示。
方法2
修改完后同样需要参照上面的步骤执行重启生效的配置。
最后可通过docker info | grep Cgroup 查看输出结果验证。

[WARNING Swap]: running with swap on is not supported. Please disable swap

提示原因：
在部署的时候未执行swapoff -a 关闭交换分区。

[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'

提示原因：
当前未将kubelet服务加入到开启自启。

节点加入集群阶段

在Master节点上的操作都完成之后，如果发现Work节点的状态还是异常，一直加入不进来，那么我们需要去排查下具体的问题原因。

首先查看下状态，例如当前存在1台节点为Not Ready
[root@k8s-master k8s]# kubectl get nodes
NAME         STATUS   ROLES                  AGE   VERSION
k8s-master   Ready    control-plane,master   16m   v1.21.2
k8s-node01   Not Ready    <none>             11m   v1.21.2
 
下一步查看具体是哪个Pod资源不正常，根据文档最上面的操作命令查看详细
[root@k8s-master k8s]# kubectl get pods -n kube-system
NAME                                 READY   STATUS             RESTARTS   AGE
coredns-558bd4d5db-hfxrf             1/1     Running            0          17m
coredns-558bd4d5db-l6kc6             1/1     Running            0          17m
etcd-k8s-master                      1/1     Running            0          17m
kube-apiserver-k8s-master            1/1     Running            0          17m
kube-controller-manager-k8s-master   1/1     Running            0          17m
kube-flannel-ds-7jsks                0/1     Error              2          12m
kube-flannel-ds-wkh5c                1/1     Running            0          12m
kube-proxy-4lmzv                     0/1     ImagePullBackOff   0          12m
kube-proxy-86sdc                     1/1     Running            0          17m
kube-scheduler-k8s-master            1/1     Running            0          17m
 
直接指定查看失败Pod问题原因
[root@k8s-master k8s]# kubectl describe pod kube-flannel-ds-7jsks -n kube-system
内容很多，我只截取部分，主要关注下面的最新事件，可以发现，机器还是去国外的k8s.gcr.io上拉取节点所需要的镜像，所以说还是一个网络问题，一个比较直接的解决方法，从Master上docker save成镜像包再复制到节点做docker load或者从第三方库拉，都可以。
Events:
  Type     Reason                  Age                     From               Message
  ----     ------                  ----                    ----               -------
  Normal   Scheduled               12m                     default-scheduler  Successfully assigned kube-system/kube-flannel-ds-7jsks to k8s-node01
  Warning  FailedCreatePodSandBox  12m                     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.203.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  9m24s (x9 over 12m)     kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 64.233.189.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  4m54s (x3 over 5m36s)   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.203.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  2m57s (x10 over 7m32s)  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 108.177.97.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  2m36s (x3 over 8m18s)   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 64.233.189.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  108s (x2 over 2m11s)    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.204.82:443: connect: connection timed out
 
现在再查看节点的状态其实已经OK了，但是查看Pod的状态还是存在问题。
[root@k8s-master k8s]# kubectl get nodes
NAME         STATUS   ROLES                  AGE   VERSION
k8s-master   Ready    control-plane,master   38m   v1.21.2
k8s-node01   Ready    <none>                 33m   v1.21.2
 
查看Pod状态发现状态从error变成了Init初始化，之后的步骤还是一样，继续查看失败原因，然后把需要的镜像都补充上。
[root@k8s-master k8s]# kubectl get pods -n kube-system
NAME                                 READY   STATUS              RESTARTS   AGE
coredns-558bd4d5db-hfxrf             1/1     Running             0          15m
coredns-558bd4d5db-l6kc6             1/1     Running             0          15m
etcd-k8s-master                      1/1     Running             0          16m
kube-apiserver-k8s-master            1/1     Running             0          16m
kube-controller-manager-k8s-master   1/1     Running             0          16m
kube-flannel-ds-7jsks                0/1     Init:0/1            0          11m
kube-flannel-ds-wkh5c                1/1     Running             0          11m
kube-proxy-4lmzv                     0/1     ContainerCreating   0          11m
kube-proxy-86sdc                     1/1     Running             0          15m
kube-scheduler-k8s-master            1/1     Running             0          16m
 
如果查看Pod资源发现出现了ErrImagePull或者ImagePullBackOff，同上操作。另外一种情况就是从原因看到，出现了已经有镜像了，但是Pod在一直尝试重新拉取，状态为CrashLoopBackOff，那我们需要去看下为什么Pod创建不了。
[root@k8s-master k8s]# kubectl get pods -n kube-system
NAME                                 READY   STATUS             RESTARTS   AGE
coredns-558bd4d5db-hfxrf             1/1     Running            0          49m
coredns-558bd4d5db-l6kc6             1/1     Running            0          49m
etcd-k8s-master                      1/1     Running            0          49m
kube-apiserver-k8s-master            1/1     Running            0          49m
kube-controller-manager-k8s-master   1/1     Running            0          49m
kube-flannel-ds-7jsks                0/1     CrashLoopBackOff   11         44m
kube-flannel-ds-wkh5c                1/1     Running            0          45m
kube-proxy-4lmzv                     1/1     Running            0          44m
kube-proxy-86sdc                     1/1     Running            0          49m
kube-scheduler-k8s-master            1/1     Running            0          49m
 
先查看下状态，可以看到所需要的镜像已经在机器上了，已经在尝试重新拉取。
[root@k8s-master k8s]# kubectl describe pod kube-flannel-ds-7jsks -n kube-system
Events:
  Type     Reason                  Age                  From               Message
  ----     ------                  ----                 ----               -------
  Normal   Scheduled               44m                  default-scheduler  Successfully assigned kube-system/kube-flannel-ds-7jsks to k8s-node01
  Warning  FailedCreatePodSandBox  44m                  kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.203.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  41m (x9 over 44m)    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 64.233.189.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  37m (x3 over 37m)    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.203.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  35m (x10 over 39m)   kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 108.177.97.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  34m (x3 over 40m)    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 64.233.189.82:443: connect: connection timed out
  Warning  FailedCreatePodSandBox  33m (x2 over 34m)    kubelet            Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.204.82:443: connect: connection timed out
  Normal   Pulling                 33m                  kubelet            Pulling image "quay.io/coreos/flannel:v0.14.0"
  Normal   Pulled                  33m                  kubelet            Successfully pulled image "quay.io/coreos/flannel:v0.14.0" in 20.016925073s
  Normal   Created                 33m                  kubelet            Created container install-cni
  Normal   Started                 33m                  kubelet            Started container install-cni
  Normal   Created                 33m                  kubelet            Created container kube-flannel
  Normal   Started                 33m                  kubelet            Started container kube-flannel
  Normal   Pulled                  33m (x2 over 33m)    kubelet            Container image "quay.io/coreos/flannel:v0.14.0" already present on machine
  Warning  BackOff                 18s (x150 over 33m)  kubelet            Back-off restarting failed container
 
然后我们看一下这个Pod的错误日志，例如，我这里出现的是网络连接的问题。在Node01上执行iptables -F ，并重启了kubelet的服务，执行后状态就已经是Running了，但是iptable -L之前存在部分规则，重启服务会生成K8s中通讯所需要的防火墙规则。
[root@k8s-master k8s]# kubectl logs kube-flannel-ds-7jsks -n kube-system
I0803 09:33:24.404804       1 main.go:520] Determining IP address of default interface
I0803 09:33:24.406369       1 main.go:533] Using interface with name eth0 and address 192.168.192.232
I0803 09:33:24.406446       1 main.go:550] Defaulting external address to interface address (192.168.192.232)
W0803 09:33:24.406597       1 client_config.go:608] Neither --kubeconfig nor --master was specified.  Using the inClusterConfig.  This might not work.
E0803 09:33:27.516391       1 main.go:251] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-7jsks': Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-7jsks": dial tcp 10.96.0.1:443: connect: connection timed out

另外，清空规则之前，如果有需求也可以先对当前的iptable规则做下备份，附上备份和恢复的命令。

备份：
[root@k8s-node01 k8s]# iptables-save > ACL.txt
恢复：
[root@k8s-node01 k8s]# iptables-restore < ACL.txt

K8S/Kubernetes

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐

【深度】阿里巴巴万级规模 K8s 集群全局高可用体系之美

作者 | 韩堂、柘远、沉醉来源 | 阿里巴巴云原生公众号前言台湾作家林清玄在接受记者采访的时候，如此评价自己 30 多年写作生涯：“第一个十年我才华横溢，‘贼光闪现’，令周边黯然失色；第二个十年，我终于‘宝光现形’，不再去抢风头，反而与身边的美丽相得益彰；进入第三个十年，繁华落尽见真醇，我进入了‘醇光初现’的阶段，真正体味到了境界之美”。长夜有穷，真水无香。领略过了 K8s“身在江

K8S/Kubernetes

如何基于 K8s 构建下一代 DevOps 平台？

作者 | 孙健波（天元）导读：当前云原生 DevOps 体系现状如何？面临哪些挑战？如何通过 OAM 解决云原生 DevOps 场景下的诸多问题？云原生开发应用模型 OAM(Open Application Model) 社区核心成员孙健波将为大家一一解答，并分享如何基于 OAM 和 Kubernetes 打造无限能力的下一代 DevOps 平台。什么是 DevOps？为什么基于 Kub