引言

今天重启了一下k8s集群Master节点所在的机器,启动完成之后发现使用kubelet指令都会报出一个错误:

[root@k8s-master ~]# kubectl get pod 
The connection to the server 10.25.78.100:6443 was refused - did you specify the right host or port?

问题定位

  • 查看kubelet的状态,发现kubelet并没有启动起来,一开始以为是没有加到开启启动项中,就试着重启了一下,发现还是不行。
[root@k8s-master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Fri 2021-07-30 23:37:51 EDT; 8s ago
     Docs: https://kubernetes.io/docs/
  Process: 9658 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $K
 Main PID: 9658 (code=exited, status=255)
[root@k8s-master ~]# systemctl restart kubelet

[root@k8s-master ~]# 
[root@k8s-master ~]# systemctl status kublet
Unit kublet.service could not be found.
[root@k8s-master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Fri 2021-07-30 23:49:52 EDT; 12s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 12946 (kubelet)
   Memory: 34.6M
   CGroup: /system.slice/kubelet.service
           └─12946 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfi...

Jul 30 23:50:04 k8s-master kubelet[12946]: W0730 23:50:04.248717   12946 pod_container_deletor.go:79] Con...ners
Jul 30 23:50:04 k8s-master kubelet[12946]: W0730 23:50:04.248742   12946 pod_container_deletor.go:79] Con...ners
Jul 30 23:50:04 k8s-master kubelet[12946]: I0730 23:50:04.248771   12946 topology_manager.go:219] [topolo...259a
Jul 30 23:50:04 k8s-master kubelet[12946]: E0730 23:50:04.251614   12946 remote_runtime.go:291] RemoveContain...
Jul 30 23:50:04 k8s-master kubelet[12946]: W0730 23:50:04.251656   12946 pod_container_deletor.go:52] [pod_co...
Jul 30 23:50:04 k8s-master kubelet[12946]: I0730 23:50:04.251669   12946 topology_manager.go:219] [topolo...259a
Jul 30 23:50:04 k8s-master kubelet[12946]: E0730 23:50:04.254117   12946 remote_runtime.go:291] RemoveContain...
Jul 30 23:50:04 k8s-master kubelet[12946]: W0730 23:50:04.254139   12946 pod_container_deletor.go:52] [pod_co...
Jul 30 23:50:04 k8s-master kubelet[12946]: W0730 23:50:04.270111   12946 pod_container_deletor.go:79] Con...ners
Jul 30 23:50:04 k8s-master kubelet[12946]: W0730 23:50:04.270140   12946 pod_container_deletor.go:79] Con...ners
Hint: Some lines were ellipsized, use -l to show in full.
[root@k8s-master ~]# 
[root@k8s-master ~]# 
[root@k8s-master ~]# 
[root@k8s-master ~]# 
[root@k8s-master ~]# 
[root@k8s-master ~]# lsof -i:6443
[root@k8s-master ~]# lsof -i:6443

  • 接着又查看了一下kubelet的启动日志,经过仔细查阅发现了端倪。
[root@k8s-master ~]# kubelet
I0730 23:44:00.971086   11346 server.go:411] Version: v1.19.0
W0730 23:44:00.971471   11346 server.go:553] standalone mode, no API client
W0730 23:44:00.976798   11346 container_manager_linux.go:951] CPUAccounting not enabled for pid: 11346
W0730 23:44:00.976818   11346 container_manager_linux.go:954] MemoryAccounting not enabled for pid: 11346
W0730 23:44:05.984989   11346 nvidia.go:61] NVIDIA GPU metrics will not be available: no NVIDIA devices found
W0730 23:44:05.987762   11346 server.go:468] No api server defined - no events will be sent to API server.
I0730 23:44:05.987781   11346 server.go:640] --cgroups-per-qos enabled, but --cgroup-root was not specified.  def
I0730 23:44:05.988056   11346 container_manager_linux.go:276] container manager verified user specified cgroup-ro
I0730 23:44:05.988068   11346 container_manager_linux.go:281] Creating Container Manager object based on Node ConpRoot:/ CgroupDriver:cgroupfs KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{erved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantiy:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:n:100ms ExperimentalTopologyManagerPolicy:none}
I0730 23:44:05.988185   11346 topology_manager.go:126] [topologymanager] Creating topology manager with none poli
I0730 23:44:05.988192   11346 container_manager_linux.go:311] [topologymanager] Initializing Topology Manager wit
I0730 23:44:05.988197   11346 container_manager_linux.go:316] Creating device plugin manager: true
I0730 23:44:05.988272   11346 client.go:77] Connecting to docker on unix:///var/run/docker.sock
I0730 23:44:05.988284   11346 client.go:94] Start docker client with request timeout=2m0s
W0730 23:44:05.989460   11346 docker_service.go:564] Hairpin mode set to "promiscuous-bridge" but kubenet is not 
I0730 23:44:05.989478   11346 docker_service.go:241] Hairpin mode set to "hairpin-veth"
I0730 23:44:06.039779   11346 docker_service.go:256] Docker cri networking managed by kubernetes.io/no-op
I0730 23:44:06.047353   11346 docker_service.go:261] Docker Info: &{ID:PYK3:FCYP:QWQM:XNXR:FFOX:I3AE:Y5TV:VPHT:OYrlay2 DriverStatus:[[Backing Filesystem xfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Pelf journald json-file logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTNfIptables:true BridgeNfIP6tables:true Debug:false NFd:24 OomKillDisable:true NGoroutines:46 SystemTime:2021-07-3.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://indb/docker HTTPProxy: HTTPSProxy: NoProxy: Name:k8s-master Labels:[] ExperimentalBuild:false ServerVersion:18.06.3-D: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<32818eb9de8e72413e616e86e Expected:468a545b9edcd5932818eb9de8e72413e616e86e} RuncCommit:{ID:a592beb5bc4c4092b1b1becurityOptions:[name=seccomp,profile=default] ProductLicense: Warnings:[]}
F0730 23:44:06.047449   11346 server.go:265] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgr
goroutine 1 [running]:
k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc00012a001, 0xc000194400, 0xaa, 0xfc)
	/workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/k
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).output(0x6cf6140, 0xc000000003, 0x0, 0x0, 0xc0007581c0, 0x6b4
	/workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/k
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).printDepth(0x6cf6140, 0xc000000003, 0x0, 0x0, 0x1, 0xc000c47c
	/workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/k
k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).print(...)
	/workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/k
k8s.io/kubernetes/vendor/k8s.io/klog/v2.Fatal(...)
F0730 23:44:06.047449   11346 server.go:265] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "cgroupfs" is different from docker cgroup driver: "systemd"

通过日志可以看到原来是kubelet和docker所用的"cgroup driver"是不同的,才导致kubelet一直启动失败!仔细回想了一下才想起来,之前是直接把另一台机器docker的daemon.json文件直接拷到了这台机器上直接,最后重启了docker。

问题解决

知道了出现问题了原因,那么解决问题就很快了。先修改一下docker的daemon.json的cgroupdriver为"cgroupfs",最后在重启docker和kubelet就行了。

[root@k8s-master ~]# vim /etc/docker/daemon.json 
[root@k8s-master ~]# systemctl daemon-reload
[root@k8s-master ~]# systemctl restart docker
[root@k8s-master ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Fri 2021-07-30 23:49:52 EDT; 3min 40s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 12946 (kubelet)
   Memory: 44.3M
   CGroup: /system.slice/kubelet.service
           └─12946 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfi...

Jul 30 23:53:31 k8s-master kubelet[12946]: E0730 23:53:31.693133   12946 kubelet.go:2183] node "k8s-maste...ound
Jul 30 23:53:31 k8s-master kubelet[12946]: E0730 23:53:31.894138   12946 kubelet.go:2183] node "k8s-maste...ound
Jul 30 23:53:32 k8s-master kubelet[12946]: E0730 23:53:31.995064   12946 kubelet.go:2183] node "k8s-maste...ound
Jul 30 23:53:32 k8s-master kubelet[12946]: E0730 23:53:32.004510   12946 reflector.go:127] k8s.io/kubernetes/...
Jul 30 23:53:32 k8s-master kubelet[12946]: E0730 23:53:32.095211   12946 kubelet.go:2183] node "k8s-maste...ound
Jul 30 23:53:32 k8s-master kubelet[12946]: E0730 23:53:32.195348   12946 kubelet.go:2183] node "k8s-maste...ound
Jul 30 23:53:32 k8s-master kubelet[12946]: E0730 23:53:32.295445   12946 kubelet.go:2183] node "k8s-maste...ound
Jul 30 23:53:32 k8s-master kubelet[12946]: E0730 23:53:32.354219   12946 controller.go:136] failed to ens...used
Jul 30 23:53:32 k8s-master kubelet[12946]: E0730 23:53:32.395548   12946 kubelet.go:2183] node "k8s-maste...ound
Jul 30 23:53:32 k8s-master kubelet[12946]: E0730 23:53:32.496241   12946 kubelet.go:2183] node "k8s-maste...ound
Hint: Some lines were ellipsized, use -l to show in full.
[root@k8s-master ~]# 
[root@k8s-master ~]# 
[root@k8s-master ~]# 
[root@k8s-master ~]# lsof -i:6443
COMMAND     PID USER   FD   TYPE  DEVICE SIZE/OFF NODE NAME
kubelet   12946 root   42u  IPv4 1374996      0t0  TCP k8s-master:46238->k8s-master:sun-sr-https (ESTABLISHED)
kube-cont 21866 root    9u  IPv4 1375527      0t0  TCP k8s-master:46382->k8s-master:sun-sr-https (ESTABLISHED)
kube-apis 22233 root    7u  IPv6 1373550      0t0  TCP *:sun-sr-https (LISTEN)
kube-apis 22233 root   80u  IPv6 1373772      0t0  TCP k8s-master:sun-sr-https->k8s-master:46056 (ESTABLISHED)
kube-apis 22233 root   82u  IPv6 1373774      0t0  TCP k8s-master:sun-sr-https->k8s-master:46074 (ESTABLISHED)
kube-apis 22233 root   95u  IPv6 1373787      0t0  TCP k8s-master:sun-sr-https->k8s-master:46238 (ESTABLISHED)
kube-apis 22233 root   96u  IPv6 1373788      0t0  TCP k8s-master:sun-sr-https->k8s-worker-1:53280 (ESTABLISHED)
kube-apis 22233 root   97u  IPv6 1373789      0t0  TCP k8s-master:sun-sr-https->k8s-master:46382 (ESTABLISHED)
kube-apis 22233 root   98u  IPv6 1375528      0t0  TCP localhost:55808->localhost:sun-sr-https (ESTABLISHED)
kube-apis 22233 root  112u  IPv6 1373791      0t0  TCP localhost:sun-sr-https->localhost:55808 (ESTABLISHED)
kube-sche 22275 root    9u  IPv4 1374926      0t0  TCP k8s-master:46056->k8s-master:sun-sr-https (ESTABLISHED)
kube-sche 22275 root   11u  IPv4 1375350      0t0  TCP k8s-master:46074->k8s-master:sun-sr-https (ESTABLISHED)
[root@k8s-master ~]# kubectl get pod -A 
NAMESPACE       NAME                                        READY   STATUS              RESTARTS   AGE
cattle-system   cattle-cluster-agent-7457d78678-rp2jl       0/1     ContainerCreating   0          16h
eplat           echo-5fb748ddfc-sttlw                       0/1     ContainerCreating   0          16h
ingress-nginx   ingress-nginx-admission-create-79wvr        0/1     Completed           0          25h
ingress-nginx   ingress-nginx-admission-patch-h87fc         0/1     Completed           1          25h
ingress-nginx   ingress-nginx-controller-5b6f645865-b2zfv   0/1     ContainerCreating   0          16h
kube-system     calico-kube-controllers-7f4f5bf95d-hcrcn    0/1     CrashLoopBackOff    4          21h
kube-system     calico-node-jpbkh                           0/1     Running             2          21h
kube-system     calico-node-lf4jm                           0/1     Completed           0          21h
kube-system     coredns-f9fd979d6-bm9qf                     0/1     CrashLoopBackOff    7          26h
kube-system     coredns-f9fd979d6-v6tt8                     0/1     Completed           0          26h
kube-system     etcd-k8s-master                             1/1     Running             3          26h
kube-system     kube-apiserver-k8s-master                   1/1     Running             5          25h
kube-system     kube-controller-manager-k8s-master          1/1     Running             5          25h
kube-system     kube-proxy-2kj5x                            1/1     Running             2          26h
kube-system     kube-proxy-p54mc                            0/1     Error               0          26h
kube-system     kube-scheduler-k8s-master                   1/1     Running             8          25h
[root@k8s-master ~]# kubectl get node -o wide
NAME           STATUS   ROLES    AGE   VERSION   INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION          CONTAINER-RUNTIME
k8s-master     Ready    master   26h   v1.19.0   10.25.78.100   <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://18.6.3
k8s-worker-1   Ready    <none>   26h   v1.19.0   10.25.78.50    <none>        CentOS Linux 7 (Core)   3.10.0-693.el7.x86_64   docker://18.6.3

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐