failed to run Kubelet: misconfiguration: kubelet cgroup driver: “systemd“ is different from docker
failed to run Kubelet: misconfiguration: kubelet cgroup driver: “systemd” is different from docker cgroup driver: “cgroupfs”k8s环境突然有一台master机器状态为:NotReady,马上开始排除…1. 先查看k8s集群内kube-system命名空间的pod状态kubec
·
failed to run Kubelet: misconfiguration: kubelet cgroup driver: “systemd” is different from docker cgroup driver: “cgroupfs”
k8s环境突然有一台master机器状态为:NotReady,马上开始排除…
1. 先查看k8s集群内kube-system命名空间的pod状态
kubectl get pod -n kube-system
发现有两个calico-kube-controllers这个pod,状态分别为:terminating和running,查看状态为terminating的pod
kubectl describe pod -n kube-system calico-kube-controllers-6b8f6f78dc-skt7m
发现被调度过的痕迹,因为集群资源不足无法正常关闭,先强制关闭
kubectl delete pod [pod name] --force --grace-period=0 -n [namespace]
2.查看kubelet状态
systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: activating (auto-restart) (Result: exit-code) since 二 2020-09-29 08:56:39 CST; 811ms ago
Docs: https://kubernetes.io/docs/
Process: 84443 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=255)
Main PID: 84443 (code=exited, status=255)
9月 29 08:56:39 p40 kubelet[84443]: io.copyBuffer(0x4be5a60, 0xc0012143f0, 0x4becda0, 0xc00020cc28, 0x0, 0x0, 0x0, 0xc001018220, 0x4bebc40, 0xc001214000)
9月 29 08:56:39 p40 kubelet[84443]: /usr/local/go/src/io/io.go:395 +0x2ff
9月 29 08:56:39 p40 kubelet[84443]: io.Copy(...)
9月 29 08:56:39 p40 kubelet[84443]: /usr/local/go/src/io/io.go:368
9月 29 08:56:39 p40 kubelet[84443]: os/exec.(*Cmd).writerDescriptor.func1(0x12a05f200, 0x0)
9月 29 08:56:39 p40 kubelet[84443]: /usr/local/go/src/os/exec/exec.go:311 +0x65
9月 29 08:56:39 p40 kubelet[84443]: os/exec.(*Cmd).Start.func1(0xc000c4a000, 0xc001240340)
9月 29 08:56:39 p40 kubelet[84443]: /usr/local/go/src/os/exec/exec.go:441 +0x27
9月 29 08:56:39 p40 kubelet[84443]: created by os/exec.(*Cmd).Start
9月 29 08:56:39 p40 kubelet[84443]: /usr/local/go/src/os/exec/exec.go:440 +0x629
发现在不停重启,猜测可能是某个底层问题,继续…
3.查看系统级kubelet日志
journalctl -f -u kubelet
让他跑一会之后,Ctrl + C 关闭
只贴一部分:
9月 30 08:35:17 p40 systemd[1]: Started kubelet: The Kubernetes Node Agent.
9月 30 08:35:17 p40 kubelet[62754]: I0930 08:35:17.809913 62754 server.go:411] Version: v1.19.0
9月 30 08:35:17 p40 kubelet[62754]: I0930 08:35:17.811115 62754 server.go:831] Client rotation is on, will bootstrap in background
9月 30 08:35:17 p40 kubelet[62754]: I0930 08:35:17.833431 62754 certificate_store.go:130] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-current.pem".
9月 30 08:35:17 p40 kubelet[62754]: I0930 08:35:17.834516 62754 dynamic_cafile_content.go:167] Starting client-ca-bundle::/etc/kubernetes/pki/ca.crt
9月 30 08:35:17 p40 kubelet[62754]: I0930 08:35:17.998634 62754 server.go:640] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
9月 30 08:35:17 p40 kubelet[62754]: I0930 08:35:17.999421 62754 container_manager_linux.go:276] container manager verified user specified cgroup-root exists: []
9月 30 08:35:18 p40 kubelet[62754]: I0930 08:35:17.999463 62754 container_manager_linux.go:281] Creating Container Manager object based on Node Config: {RuntimeCgroupsName: SystemCgroupsName: KubeletCgroupsName: ContainerRuntime:docker CgroupsPerQOS:true CgroupRoot:/ CgroupDriver:systemd KubeletRootDir:/var/lib/kubelet ProtectKernelDefaults:false NodeAllocatableConfig:{KubeReservedCgroupName: SystemReservedCgroupName: ReservedSystemCPUs: EnforceNodeAllocatable:map[pods:{}] KubeReserved:map[] SystemReserved:map[] HardEvictionThresholds:[{Signal:memory.available Operator:LessThan Value:{Quantity:100Mi Percentage:0} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.1} GracePeriod:0s MinReclaim:<nil>} {Signal:nodefs.inodesFree Operator:LessThan Value:{Quantity:<nil> Percentage:0.05} GracePeriod:0s MinReclaim:<nil>} {Signal:imagefs.available Operator:LessThan Value:{Quantity:<nil> Percentage:0.15} GracePeriod:0s MinReclaim:<nil>}]} QOSReserved:map[] ExperimentalCPUManagerPolicy:none ExperimentalCPUManagerReconcilePeriod:10s ExperimentalPodPidsLimit:-1 EnforceCPULimits:true CPUCFSQuotaPeriod:100ms ExperimentalTopologyManagerPolicy:none}
9月 30 08:35:18 p40 kubelet[62754]: I0930 08:35:17.999743 62754 topology_manager.go:126] [topologymanager] Creating topology manager with none policy
9月 30 08:35:18 p40 kubelet[62754]: I0930 08:35:17.999762 62754 container_manager_linux.go:311] [topologymanager] Initializing Topology Manager with none policy
9月 30 08:35:18 p40 kubelet[62754]: I0930 08:35:17.999774 62754 container_manager_linux.go:316] Creating device plugin manager: true
9月 30 08:35:18 p40 kubelet[62754]: I0930 08:35:17.999916 62754 client.go:77] Connecting to docker on unix:///var/run/docker.sock
9月 30 08:35:18 p40 kubelet[62754]: I0930 08:35:17.999938 62754 client.go:94] Start docker client with request timeout=2m0s
9月 30 08:35:18 p40 kubelet[62754]: W0930 08:35:18.013687 62754 docker_service.go:564] Hairpin mode set to "promiscuous-bridge" but kubenet is not enabled, falling back to "hairpin-veth"
9月 30 08:35:18 p40 kubelet[62754]: I0930 08:35:18.013797 62754 docker_service.go:241] Hairpin mode set to "hairpin-veth"
9月 30 08:35:18 p40 kubelet[62754]: I0930 08:35:18.058874 62754 docker_service.go:256] Docker cri networking managed by cni
9月 30 08:35:18 p40 kubelet[62754]: I0930 08:35:18.070844 62754 docker_service.go:261] Docker Info: &{ID:SOOH:LVGQ:H3V2:4SBH:YZA2:VSSP:O374:XNEV:T6E5:X2ED:PM6H:U2CZ Containers:85 ContainersRunning:0 ContainersPaused:0 ContainersStopped:85 Images:49 Driver:overlay2 DriverStatus:[[Backing Filesystem extfs] [Supports d_type true] [Native Overlay Diff true]] SystemStatus:[] Plugins:{Volume:[local] Network:[bridge host ipvlan macvlan null overlay] Authorization:[] Log:[awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog]} MemoryLimit:true SwapLimit:true KernelMemory:true KernelMemoryTCP:true CPUCfsPeriod:true CPUCfsQuota:true CPUShares:true CPUSet:true PidsLimit:true IPv4Forwarding:true BridgeNfIptables:true BridgeNfIP6tables:true Debug:false NFd:22 OomKillDisable:true NGoroutines:35 SystemTime:2020-09-30T08:35:18.059835927+08:00 LoggingDriver:json-file CgroupDriver:cgroupfs NEventsListener:0 KernelVersion:3.10.0-1062.el7.x86_64 OperatingSystem:CentOS Linux 7 (Core) OSType:linux Architecture:x86_64 IndexServerAddress:https://index.docker.io/v1/ RegistryConfig:0xc0013018f0 NCPU:64 MemTotal:67000582144 GenericResources:[] DockerRootDir:/home/gyserver/data/docker_store/docker HTTPProxy: HTTPSProxy: NoProxy: Name:p40 Labels:[] ExperimentalBuild:false ServerVersion:19.03.12 ClusterStore: ClusterAdvertise: Runtimes:map[nvidia:{Path:nvidia-container-runtime Args:[]} runc:{Path:runc Args:[]}] DefaultRuntime:runc Swarm:{NodeID: NodeAddr: LocalNodeState:inactive ControlAvailable:false Error: RemoteManagers:[] Nodes:0 Managers:0 Cluster:<nil> Warnings:[]} LiveRestoreEnabled:false Isolation: InitBinary:docker-init ContainerdCommit:{ID:b34a5c8af56e510852c35414db4c1f4fa6172339 Expected:b34a5c8af56e510852c35414db4c1f4fa6172339} RuncCommit:{ID:3e425f80a8c931f88e6d94a8c831b9d5aa481657 Expected:3e425f80a8c931f88e6d94a8c831b9d5aa481657} InitCommit:{ID:fec3683 Expected:fec3683} SecurityOptions:[name=seccomp,profile=default] ProductLicense: Warnings:[]}
9月 30 08:35:18 p40 kubelet[62754]: F0930 08:35:18.071016 62754 server.go:265] failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
9月 30 08:35:18 p40 kubelet[62754]: goroutine 1 [running]:
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.stacks(0xc000010001, 0xc000538c00, 0xaa, 0xfc)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:996 +0xb9
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).output(0x6cf6140, 0xc000000003, 0x0, 0x0, 0xc001301960, 0x6b4854c, 0x9, 0x109, 0x0)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:945 +0x191
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).printDepth(0x6cf6140, 0xc000000003, 0x0, 0x0, 0x1, 0xc00181fc80, 0x1, 0x1)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:718 +0x165
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).print(...)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:703
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.Fatal(...)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1436
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/cmd/kubelet/app.NewKubeletCommand.func1(0xc0008af8c0, 0xc00004e2b0, 0x5, 0x5)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/cmd/kubelet/app/server.go:265 +0x63e
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).execute(0xc0008af8c0, 0xc00004e2b0, 0x5, 0x5, 0xc0008af8c0, 0xc00004e2b0)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:846 +0x2c2
9月 30 08:35:18 p40 systemd[1]: kubelet.service: main process exited, code=exited, status=255/n/a
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).ExecuteC(0xc0008af8c0, 0x16396891c53017c1, 0x6cf5c60, 0x409b05)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:950 +0x375
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/github.com/spf13/cobra.(*Command).Execute(...)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/github.com/spf13/cobra/command.go:887
9月 30 08:35:18 p40 kubelet[62754]: main.main()
9月 30 08:35:18 p40 kubelet[62754]: _output/dockerized/go/src/k8s.io/kubernetes/cmd/kubelet/kubelet.go:41 +0xe5
9月 30 08:35:18 p40 kubelet[62754]: goroutine 6 [chan receive]:
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/k8s.io/klog/v2.(*loggingT).flushDaemon(0x6cf6140)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:1131 +0x8b
9月 30 08:35:18 p40 kubelet[62754]: created by k8s.io/kubernetes/vendor/k8s.io/klog/v2.init.0
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/klog/v2/klog.go:416 +0xd8
9月 30 08:35:18 p40 kubelet[62754]: goroutine 165 [select]:
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.WaitFor(0xc00097e000, 0xc000974040, 0xc00051a0c0, 0x0, 0x0)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:539 +0x11d
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.PollUntil(0xdf8475800, 0xc000974040, 0xc000135560, 0x0, 0x0)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:492 +0xc5
9月 30 08:35:18 p40 kubelet[62754]: k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait.PollImmediateUntil(0xdf8475800, 0xc000974040, 0xc000135560, 0xb, 0xc000115f48)
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:511 +0xb3
9月 30 08:35:18 p40 systemd[1]: Unit kubelet.service entered failed state.
9月 30 08:35:18 p40 kubelet[62754]: created by k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates.(*DynamicFileCAContent).Run
9月 30 08:35:18 p40 kubelet[62754]: /workspace/anago-v1.19.0-rc.4.197+594f888e19d8da/src/k8s.io/kubernetes/_output/dockerized/go/src/k8s.io/kubernetes/vendor/k8s.io/apiserver/pkg/server/dynamiccertificates/dynamic_cafile_content.go:174 +0x2b3
9月 30 08:35:18 p40 kubelet[62754]: goroutine 164 [sync.Cond.Wait]:
9月 30 08:35:18 p40 kubelet[62754]: runtime.goparkunlock(...)
9月 30 08:35:18 p40 kubelet[62754]: /usr/local/go/src/runtime/proc.go:312
9月 30 08:35:18 p40 kubelet[62754]: sync.runtime_notifyListWait(0xc000a89c50, 0xc000000000)
9月 30 08:35:18 p40 kubelet[62754]: /usr/local/go/src/runtime/sema.go:513 +0xf8
9月 30 08:35:18 p40 kubelet[62754]: sync.(*Cond).Wait(0xc000a89c40)
9月 30 08:35:18 p40 kubelet[62754]: /usr/local/go/src/sync/cond.go:56 +0x9d
一行一行仔细查看,最后发现了这一行:
failed to run Kubelet: misconfiguration: kubelet cgroup driver: "systemd" is different from docker cgroup driver: "cgroupfs"
原因已经很明白了,配置错误:kubelet cgroup驱动程序“ systemd”与docker cgroup驱动程序“ cgroupfs” 不相同,
赶紧查看/etc/docker/daemon.json
cat /etc/docker/daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
之前明明配置的好好的,看样子不知道是谁把nvidia-docker重装了就出现这个问题了,烦
重新配置
vim /etc/docker/daemon.json
{
"default-runtime":"nvidia",
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
},
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
]
}
改完后重启docker
systemctl restart docker
查看kubelet.service
systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 三 2020-09-30 08:54:35 CST; 1min 3s ago
Docs: https://kubernetes.io/docs/
Main PID: 80462 (kubelet)
Tasks: 65
Memory: 71.9M
CGroup: /system.slice/kubelet.service
└─80462 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network-plugin=cni --p...
9月 30 08:55:22 p40 kubelet[80462]: I0930 08:55:22.176298 80462 reconciler.go:319] Volume detached for volume "config-volume" (UniqueName: "kubernetes.io/configmap/9e218809-f8c8-415d-8f... DevicePath ""
9月 30 08:55:22 p40 kubelet[80462]: I0930 08:55:22.176317 80462 reconciler.go:319] Volume detached for volume "prometheus-token-hh8bp" (UniqueName: "kubernetes.io/secret/9e218809-f8c8-4... DevicePath ""
9月 30 08:55:24 p40 kubelet[80462]: E0930 08:55:24.121234 80462 kubelet_volumes.go:154] orphaned pod "9e218809-f8c8-415d-8ffb-1b6711da1797" found, but volume paths are still present on ...y to see them.
9月 30 08:55:26 p40 kubelet[80462]: E0930 08:55:26.128108 80462 kubelet_volumes.go:154] orphaned pod "9e218809-f8c8-415d-8ffb-1b6711da1797" found, but volume paths are still present on ...y to see them.
9月 30 08:55:28 p40 kubelet[80462]: E0930 08:55:28.121500 80462 kubelet_volumes.go:154] orphaned pod "9e218809-f8c8-415d-8ffb-1b6711da1797" found, but volume paths are still present on ...y to see them.
9月 30 08:55:30 p40 kubelet[80462]: E0930 08:55:30.126501 80462 kubelet_volumes.go:154] orphaned pod "9e218809-f8c8-415d-8ffb-1b6711da1797" found, but volume paths are still present on ...y to see them.
9月 30 08:55:32 p40 kubelet[80462]: E0930 08:55:32.118160 80462 kubelet_volumes.go:154] orphaned pod "9e218809-f8c8-415d-8ffb-1b6711da1797" found, but volume paths are still present on ...y to see them.
9月 30 08:55:34 p40 kubelet[80462]: E0930 08:55:34.126984 80462 kubelet_volumes.go:154] orphaned pod "9e218809-f8c8-415d-8ffb-1b6711da1797" found, but volume paths are still present on ...y to see them.
9月 30 08:55:36 p40 kubelet[80462]: E0930 08:55:36.118055 80462 kubelet_volumes.go:154] orphaned pod "9e218809-f8c8-415d-8ffb-1b6711da1797" found, but volume paths are still present on ...y to see them.
9月 30 08:55:38 p40 kubelet[80462]: E0930 08:55:38.125897 80462 kubelet_volumes.go:154] orphaned pod "9e218809-f8c8-415d-8ffb-1b6711da1797" found, but volume paths are still present on ...y to see them.
Hint: Some lines were ellipsized, use -l to show in full.
完美!
查看k8s 节点状态
kubectl get nodes
NAME STATUS ROLES AGE VERSION
node-1 Ready master 25d v1.19.0
node-2 Ready master 25d v1.19.0
p40 Ready master 25d v1.19.0
原因很简单,思路很重要
更多推荐
已为社区贡献4条内容
所有评论(0)