使用Kubeadm快速部署Kubernetes
Kubernetes由google通过go语言基于borg(google内部自研发自用的资源管理器,根据当时K8s面世的时间,brog已在google内部运行长达10年以上)进行重写的容器资源管理器。由于K和s之间有8个字母,因此又被叫做K8S。文章目录安装部署环境准备服务器环境配置Master初始化**常见错误****节点加入集群阶段**本次署实例中用到的kubernetes常用包括如下服务功能
Kubernetes由google通过go语言基于borg(google内部自研发自用的资源管理器,根据当时K8s面世的时间,brog已在google内部运行长达10年以上)进行重写的容器资源管理器。由于K和s之间有8个字母,因此又被叫做K8S。
文章目录
本次署实例中用到的kubernetes常用包括如下
服务功能 | 命令 | 备注 |
---|---|---|
创建资源 | kubectl create kubectl create deployment web --image=nginx --replicas=3 kubectl create -f /path/to/deployment.yaml | ● 基于命令,进行明确的创建动作,如果资源已经存在,由于资源名称在名称空间中是唯一的,因此再次执行则会报错。 ● 可使用-f参数后面能跟上文件的绝对路径让通过配置文件进行创建。 ● 除了deployment外,还支持创建Pod、Namespace、Node、Service、ReplicaSet等资源。 |
删除资源 | kubectl delete 资源类型 资源名称 kubectl delete deploy test-nginx kubectl delete pods pod1 | ● 可同时删除多个资源,通过”,“号分割。 |
暴露服务 | kubectl expose deployment web --port=80 --target-port=80 --type=NodePort | ● 将K8s中Pod的服务端口映射到Node以供外部访问。 |
查看Pod资源 | kubectl get pods/po(简写) kubectl get pods -o wide | ● 使用-o wide以获取更加详细的信息,例如Pod的IP地址,以及处在哪个物理Node上。 ● 也可以使用-o 指定其他格式,例如-o json、-o yaml。 |
查看服务 | kubectl get services/svc(简写) kubectl get svc --all-namespaces kubectl get svc,pod --all-namespaces | ● 查看当前存在的服务。 ● 配合-n NAMESPACES_NAME可在指定虚拟集群范围内查询。 ● 配合–all-namespaces或者-A可查询所有虚拟集群中的服务。 ● 支持同时查询服务和pod资源。 |
查看节点 | kubectl get nodes/no(简写) kubectl get nodes -o wide kubectl get nodes -o wide -n kube-system | ● 使用-o wide以获取更加详细的信息,例如Node的IP地址,以及操作系统镜像和内核版本。 ● 也可以使用-o 指定其他格式,例如-o json、-o yaml。 ● 加上-n kube-system 或者–namespace NAMESPACE_NAME可以查看kubernetes的基础组件的详细。 |
查看日志 | kubectl logs pod kubectl logs kube-flannel-ds-npz55 -n kube-system(替换为实际的pod名称) | ● 查看pod中的日志信息,如果pod中存在多个容器,可通过-c指定容器。 ● 查看某个pod资源时建议配合-n NAMESPACE_NAME以指定命名空间。 |
查看资源详细 | kubectl describe pods kubectl describe pods POD_NAME -n kube-system | ● 排错必备命令,如果某个POD或SVC出现了异常状态,可进行查看。 ● 可直接查看一类资源的详细也可以只查看某类中指定的资源。 |
节点加入集群 | kubeadm join | ● 后面需要接token 具体的命令内容会在初始化后直接给出,复制即可使用。 |
节点移除集群 | kubectl drain | ● 如果执行没有返回错误则成功,此时可以对该节点执行更改或关机,甚至是删除操作。 |
更多详细命令可见Kubernetes kubectl中文文档命令表。
安装部署
根据官网的介绍,目前已经支持在Ubuntu、CentOS等Linux的各种发行版及Windows、MacOS平台的安装部署。
支持的系统及版本包括 Ubuntu 16.04 / Debian9/CentOS 7/RHEL7/Fedora 25 / HypriotOS v1.0.1 等及以上版本。
环境准备
K8s为容器提供高可用性,本身以集群方式部署所需要的硬件配置也有一定要求,这次演示尝试最低的部署硬件配置详细如下表。
角色 | 名称及地址 | 操作系统及内核版本 | 硬件配置 |
---|---|---|---|
Master | k8s-master 192.168.124.135 | CentOS Linux release 7.4.1708 (Core) Linux 3.10.0-1160.24.1.el7.x86_64 | CPU:2核 内存:4G 硬盘:40G |
Node | k8s-node01 192.168.124.153 k8s-node02 192.168.124.154 k8s-node03 192.168.124.155 | CentOS Linux release 7.9.2009 (Core) Linux 3.10.0-1160.el7.x86_64 | CPU:2核 内存:2G 硬盘:40G |
实际配置网络时要求使用静态地址,需要相应的配置一下可用的DNS(否则后期部署时无法正常下载软件包而报错),此次部署给各网卡配置的为114.114.114.114。
Tips:Kubelet自1.8版本开始强制要求swap必须关闭。
服务器环境配置
(Master和Node节点都需要执行的操作
)
关闭防火墙
[root@docker-master ~]# systemctl stop firewalld && systemctl disable firewalld && setenforce 0
[root@docker-master ~]# vi /etc/selinux/config
[root@docker-master ~]# grep SELINUX /etc/selinux/config
SELINUX=disabled
关闭交换分区
部署测试时未永久关闭,服务器发生重启时,kubelet服务会启动失败,需要再次关闭。
[root@docker-master ~]# swapoff -a
Tip:永久关闭方法,需要注释掉/etc/fstab文件中对交换内存的挂载。
[root@docker-master ~]# grep 'swap' /etc/fstab
#dev/mapper/centos-swap swap swap defaults 0 0
添加主机映射
[root@docker-master ~]# cat >> /etc/hosts <<EOF
192.168.124.135 k8s-master
192.168.124.153 k8s-node01
192.168.124.154 k8s-node02
192.168.124.155 k8s-node03
EOF
内核参数调整
[root@k8s-master ~]# sysctl -w net.ipv4.ip_forward=1
[root@k8s-master ~]# cat <<EOF > /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
[root@k8s-master ~]# sysctl --system
* Applying /usr/lib/sysctl.d/00-system.conf ...
* Applying /usr/lib/sysctl.d/10-default-yama-scope.conf ...
kernel.yama.ptrace_scope = 0
* Applying /usr/lib/sysctl.d/50-default.conf ...
kernel.sysrq = 16
kernel.core_uses_pid = 1
kernel.kptr_restrict = 1
net.ipv4.conf.default.rp_filter = 1
net.ipv4.conf.all.rp_filter = 1
net.ipv4.conf.default.accept_source_route = 0
net.ipv4.conf.all.accept_source_route = 0
net.ipv4.conf.default.promote_secondaries = 1
net.ipv4.conf.all.promote_secondaries = 1
fs.protected_hardlinks = 1
fs.protected_symlinks = 1
* Applying /etc/sysctl.d/99-sysctl.conf ...
* Applying /etc/sysctl.d/k8s.conf ...
* Applying /etc/sysctl.conf ...
安装YUM仓库相关管理工具并添加docker源
[root@k8s-master ~]# yum install -y yum-utils device-mapper-persistent-data lvm2
[root@k8s-master ~]# yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
Docker安装
[root@k8s-master ~]# yum install -y docker-ce
启动并设置开机自启
[root@k8s-master ~]# systemctl start docker && systemctl enable docker
Created symlink from /etc/systemd/system/multi-user.target.wants/docker.service to /usr/lib/systemd/system/docker.service.
创建国内的K8s镜像源
[root@k8s-master ~]# cat >>/etc/yum.repos.d/kubernetes.repo <<EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
安装K8s组件
根据网上的资料,由于官网未开放同步方式, 可能会有索引gpg检查失败的情况,如果遇到了,可以用带–nogpgcheck参数的命令安装。
安装K8s组件
[root@k8s-master ~]# yum install -y kubelet kubeadm kubectl
或
[root@k8s-master ~]# yum install -y --nogpgcheck kubelet kubeadm kubectl
启动并设置开机自启
[root@k8s-master ~]# systemctl start kubelet && systemctl enable kubelet
Tips:通过YUM安装时若未指定软件版本则默认安装仓库内最新版本,后面初始化时需要对应管理工具的版本。
可通过yum list {software_name}来查询,以查找kubelet的信息为例
[root@k8s-master ~]# yum list kubectl --showduplicates | sort -r
然后在安装的时候直接指定软件版本
[root@k8s-master ~]# yum install -y kubelet-1.21.10-0 kubeadm-1.21.10-0 kubectl-1.21.10-0 --disableexcludes=kubernetes (禁止拉取除这个仓库之外的仓库数据)
Master初始化
(只在Master节点上操作
)
kebeadm init 初始化命令相关参数说明:
● --kubernetes-version=v1.21.2 # 指定安装k8s版本,如果不指定默认使用最新版本,这里必须要与之前安装kubectl、kubeadm、kubelet的版本一致。
● --apiserver-advertise-address 192.168.124.135 #这里是apiserver的地址,也就master主机IP地址。
● --pod-network-cidr=10.200.0.0/16# 这个是后期创建pod时候使用IP地址段。
● --image-repository 这个是指定拉取镜像的源仓库地址,默认会从国外的网站拉取。
● --ignore-preflight-errors=all 这个是忽略在安装过程中遇到的报错,例如拉取镜像失败等。该项为可选,建议添加,保证流程顺利跑下去,过程中如果有报错信息依然会输出。
若初始化操作遇到了拉取镜像问题可参考如何从其它仓库拉取K8s集群初始化所需镜像
初始化
指定镜像仓库
[root@k8s-master k8s]# kubeadm init --image-repository registry.aliyuncs.com/google_containers --kubernetes-version=v1.21.2 --apiserver-advertise-address=192.168.124.135 --pod-network-cidr=10.200.0.0/16 --ignore-preflight-errors=all
使用默认镜像仓库
[root@k8s-master k8s]# kubeadm init --kubernetes-version=v1.21.2 --apiserver-advertise-address=192.168.124.135 --pod-network-cidr=10.200.0.0/16 --ignore-preflight-errors=all
[init] Using Kubernetes version: v1.21.2
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 192.168.124.135]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.124.135 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-master localhost] and IPs [192.168.124.135 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 10.005036 seconds
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
[upload-certs] Skipping phase. Please see --upload-certs
[mark-control-plane] Marking the node k8s-master as control-plane by adding the labels: [node-role.kubernetes.io/master(deprecated) node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node k8s-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: 72g17q.e8w5h1vkrs1vd4go
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 192.168.124.135:6443 --token 72g17q.e8w5h1vkrs1vd4go \
--discovery-token-ca-cert-hash sha256:18004cf996b2774851ab9a7e071e427598c8645ccb01d8ff057dab6ca9ae5701
最后的成功的输出如上,根据提示,下一步需要将kubectl工具使用的kubeconfig文件到默认调用目录下,这里可根据实际操作的账号身份为管理员或普通用户进行选择。
复制kubeconfig文件到默认调用位置
[root@k8s-master k8s]# mkdir -p $HOME/.kube
[root@k8s-master k8s]# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
[root@k8s-master k8s]# sudo chown $(id -u):$(id -g) $HOME/.kube/config
根据"You should now deploy a pod network to the cluster"提示,再下一步操作要求根据给出的方法安装容器网络插件,然后将节点加入Master组成集群。
网络插件有很多,官方给出了详细的介绍:https://kubernetes.io/zh/docs/concepts/cluster-administration/addons/
安装网络插件
本次选择的是Flannel
下载网络插件容器YML文件
[root@k8s-master k8s]# wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
执行网络插件容器构建
[root@k8s-master k8s]# kubectl apply -f kube-flannel.yml
节点Node加入集群
根据初始化后给出的token信息将其他节点加入集群
[root@k8s-node01 ~]# kubeadm join 192.168.124.135:6443 --token 72g17q.e8w5h1vkrs1vd4go \
--discovery-token-ca-cert-hash sha256:18004cf996b2774851ab9a7e071e427598c8645ccb01d8ff057dab6ca9ae5701
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the gue at https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
查看节点状态及服务情况
[root@k8s-master ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 31h v1.21.2
k8s-node01 Ready <none> 30h v1.21.2
k8s-node02 Ready <none> 30h v1.21.2
k8s-node03 Ready <none> 30h v1.21.2
查看kube-system组件服务状态
[root@k8s-master ~]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-558bd4d5db-7kxcw 1/1 Running 1 31h
coredns-558bd4d5db-z9vtc 1/1 Running 1 31h
etcd-k8s-master 1/1 Running 1 31h
kube-apiserver-k8s-master 1/1 Running 1 31h
kube-controller-manager-k8s-master 1/1 Running 1 31h
kube-flannel-ds-r7hld 1/1 Running 12 30h
kube-flannel-ds-wj6gd 1/1 Running 12 30h
kube-flannel-ds-xc65s 1/1 Running 13 30h
kube-flannel-ds-xdjl5 1/1 Running 1 30h
kube-proxy-56mpr 1/1 Running 1 30h
kube-proxy-czm9n 1/1 Running 1 31h
kube-proxy-gm69f 1/1 Running 1 30h
kube-proxy-p5dr9 1/1 Running 1 30h
kube-scheduler-k8s-master 1/1 Running 1 31h
验证服务可用性
首先查看当前存在的Pod情况
[root@k8s-master k8s]# kubectl get pods
No resources found in default namespace.
通过Nginx镜像创建一个名为web的pod
[root@k8s-master ~]# kubectl create deployment web --image=nginx
将名称为web的Pod的服务端口暴露到宿主机以供外部访问
[root@k8s-master ~]# kubectl expose deployment web --port=80 --target-port=80 --type=NodePort
[root@k8s-master ~]# kubectl get pod
NAME READY STATUS RESTARTS AGE
web-96d5df5c8-czkrq 1/1 Running 1 30h
查看当前的服务
[root@k8s-master ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 31h
web NodePort 10.98.61.137 <none> 80:31394/TCP 30h
此时已经可以打开浏览器访问节点的IP:指定的端口来进行访问。
kubeadm生成的token一般24小时后就过期,所以后面再有node加入集群,需要重新创建新的token,获取的命令如下:
可直接输入以下命令
kubeadm token create --print-join-command
也可按以下操作分步骤执行
1.重新生成新的token
kubeadm token create
2.查看生成的token
kubeadm token list
3.获取ca证书sha256编码hash值
openssl x509 -pubkey -in /etc/kubernetes/pki/ca.crt | openssl rsa -pubin -outform der 2>/dev/null | openssl dgst -sha256 -hex | sed ‘s/^.* //’
根据获取的内容,按照以下格式排列组合即完整命令
kubeadm join : --token --discovery-token-ca-cert-hash sha256:
常见错误
部署阶段
部署的时候如果DNS未配置或配置的有问题,获取Docker和K8S仓库源的时候会出错,提示Could not resolve host:
遇到了需要升级contain-SElinux的情况,需要保证还有CentOS-Base.repo源,没有的话补上就行了。
wget -O /etc/yum.repos.d/CentOS-Base.repo http://mirrors.aliyun.com/repo/Centos-7.repo
未设置/proc/sys/net/ipv4/ip_forward 文件参数等于1
Master执行初始化过程时环境检查,kubelet服务未设置开机开机自启动
由于安装K8s组件的时候未指定对应版本,所以默认安装的是YUM源内最新的,此处指定的工具版本比默认安装的kubelet版本更低,因此不支持。
K8s初始化不指定镜像源,默认会去找国外的网站,需要网络翻墙,否则结果就是等待半个小时然后报错
如果部署的过程中关闭了防火墙,但是未设置永久关闭,部署完成之后发生机器重启防火墙再开启可能会出现该报错。
初始化阶段
在Master节点上做初始化的时候会出现一些告警提示,常见的告警及具体原因如下。
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected “cgroupfs” as the Docker cgroup driver. The recommended driver is “systemd”. Please follow the guide at https://kubernetes.io/docs/setup/cri/
提示原因:
官方文档表示,更改设置,令容器运行时和kubelet使用systemd作为cgroup驱动,以此使系统更为稳定。 请注意在docker下设置native.cgroupdriver=systemd选项。
方法一
通过编辑docker配置文件/etc/docker/daemon.json(有则改,无则加)
[root@k8s-master ~]# vim /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
[root@k8s-master ~]# systemctl daemon-reload
[root@k8s-master ~]# systemctl restart docker
方法二
编辑/usr/lib/systemd/system/docker.service,修改成如图下所示。
修改完后同样需要参照上面的步骤执行重启生效的配置。
最后可通过docker info | grep Cgroup 查看输出结果验证。
[WARNING Swap]: running with swap on is not supported. Please disable swap
提示原因:
在部署的时候未执行swapoff -a 关闭交换分区。
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
提示原因:
当前未将kubelet服务加入到开启自启。
节点加入集群阶段
在Master节点上的操作都完成之后,如果发现Work节点的状态还是异常,一直加入不进来,那么我们需要去排查下具体的问题原因。
首先查看下状态,例如当前存在1台节点为Not Ready
[root@k8s-master k8s]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 16m v1.21.2
k8s-node01 Not Ready <none> 11m v1.21.2
下一步查看具体是哪个Pod资源不正常,根据文档最上面的操作命令查看详细
[root@k8s-master k8s]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-558bd4d5db-hfxrf 1/1 Running 0 17m
coredns-558bd4d5db-l6kc6 1/1 Running 0 17m
etcd-k8s-master 1/1 Running 0 17m
kube-apiserver-k8s-master 1/1 Running 0 17m
kube-controller-manager-k8s-master 1/1 Running 0 17m
kube-flannel-ds-7jsks 0/1 Error 2 12m
kube-flannel-ds-wkh5c 1/1 Running 0 12m
kube-proxy-4lmzv 0/1 ImagePullBackOff 0 12m
kube-proxy-86sdc 1/1 Running 0 17m
kube-scheduler-k8s-master 1/1 Running 0 17m
直接指定查看失败Pod问题原因
[root@k8s-master k8s]# kubectl describe pod kube-flannel-ds-7jsks -n kube-system
内容很多,我只截取部分,主要关注下面的最新事件,可以发现,机器还是去国外的k8s.gcr.io上拉取节点所需要的镜像,所以说还是一个网络问题,一个比较直接的解决方法,从Master上docker save成镜像包再复制到节点做docker load或者从第三方库拉,都可以。
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned kube-system/kube-flannel-ds-7jsks to k8s-node01
Warning FailedCreatePodSandBox 12m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.203.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 9m24s (x9 over 12m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 64.233.189.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 4m54s (x3 over 5m36s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.203.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 2m57s (x10 over 7m32s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 108.177.97.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 2m36s (x3 over 8m18s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 64.233.189.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 108s (x2 over 2m11s) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.204.82:443: connect: connection timed out
现在再查看节点的状态其实已经OK了,但是查看Pod的状态还是存在问题。
[root@k8s-master k8s]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 38m v1.21.2
k8s-node01 Ready <none> 33m v1.21.2
查看Pod状态发现状态从error变成了Init初始化,之后的步骤还是一样,继续查看失败原因,然后把需要的镜像都补充上。
[root@k8s-master k8s]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-558bd4d5db-hfxrf 1/1 Running 0 15m
coredns-558bd4d5db-l6kc6 1/1 Running 0 15m
etcd-k8s-master 1/1 Running 0 16m
kube-apiserver-k8s-master 1/1 Running 0 16m
kube-controller-manager-k8s-master 1/1 Running 0 16m
kube-flannel-ds-7jsks 0/1 Init:0/1 0 11m
kube-flannel-ds-wkh5c 1/1 Running 0 11m
kube-proxy-4lmzv 0/1 ContainerCreating 0 11m
kube-proxy-86sdc 1/1 Running 0 15m
kube-scheduler-k8s-master 1/1 Running 0 16m
如果查看Pod资源发现出现了ErrImagePull或者ImagePullBackOff,同上操作。另外一种情况就是从原因看到,出现了已经有镜像了,但是Pod在一直尝试重新拉取,状态为CrashLoopBackOff,那我们需要去看下为什么Pod创建不了。
[root@k8s-master k8s]# kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-558bd4d5db-hfxrf 1/1 Running 0 49m
coredns-558bd4d5db-l6kc6 1/1 Running 0 49m
etcd-k8s-master 1/1 Running 0 49m
kube-apiserver-k8s-master 1/1 Running 0 49m
kube-controller-manager-k8s-master 1/1 Running 0 49m
kube-flannel-ds-7jsks 0/1 CrashLoopBackOff 11 44m
kube-flannel-ds-wkh5c 1/1 Running 0 45m
kube-proxy-4lmzv 1/1 Running 0 44m
kube-proxy-86sdc 1/1 Running 0 49m
kube-scheduler-k8s-master 1/1 Running 0 49m
先查看下状态,可以看到所需要的镜像已经在机器上了,已经在尝试重新拉取。
[root@k8s-master k8s]# kubectl describe pod kube-flannel-ds-7jsks -n kube-system
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 44m default-scheduler Successfully assigned kube-system/kube-flannel-ds-7jsks to k8s-node01
Warning FailedCreatePodSandBox 44m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.203.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 41m (x9 over 44m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 64.233.189.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 37m (x3 over 37m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.203.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 35m (x10 over 39m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 108.177.97.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 34m (x3 over 40m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 64.233.189.82:443: connect: connection timed out
Warning FailedCreatePodSandBox 33m (x2 over 34m) kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed pulling image "k8s.gcr.io/pause:3.4.1": Error response from daemon: Get https://k8s.gcr.io/v2/: dial tcp 74.125.204.82:443: connect: connection timed out
Normal Pulling 33m kubelet Pulling image "quay.io/coreos/flannel:v0.14.0"
Normal Pulled 33m kubelet Successfully pulled image "quay.io/coreos/flannel:v0.14.0" in 20.016925073s
Normal Created 33m kubelet Created container install-cni
Normal Started 33m kubelet Started container install-cni
Normal Created 33m kubelet Created container kube-flannel
Normal Started 33m kubelet Started container kube-flannel
Normal Pulled 33m (x2 over 33m) kubelet Container image "quay.io/coreos/flannel:v0.14.0" already present on machine
Warning BackOff 18s (x150 over 33m) kubelet Back-off restarting failed container
然后我们看一下这个Pod的错误日志,例如,我这里出现的是网络连接的问题。在Node01上执行iptables -F ,并重启了kubelet的服务,执行后状态就已经是Running了,但是iptable -L之前存在部分规则,重启服务会生成K8s中通讯所需要的防火墙规则。
[root@k8s-master k8s]# kubectl logs kube-flannel-ds-7jsks -n kube-system
I0803 09:33:24.404804 1 main.go:520] Determining IP address of default interface
I0803 09:33:24.406369 1 main.go:533] Using interface with name eth0 and address 192.168.192.232
I0803 09:33:24.406446 1 main.go:550] Defaulting external address to interface address (192.168.192.232)
W0803 09:33:24.406597 1 client_config.go:608] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
E0803 09:33:27.516391 1 main.go:251] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-7jsks': Get "https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-7jsks": dial tcp 10.96.0.1:443: connect: connection timed out
另外,清空规则之前,如果有需求也可以先对当前的iptable规则做下备份,附上备份和恢复的命令。
备份:
[root@k8s-node01 k8s]# iptables-save > ACL.txt
恢复:
[root@k8s-node01 k8s]# iptables-restore < ACL.txt
更多推荐
所有评论(0)