k8s pod 文件权限不足
两套集群,同样的云平台,操作系统版本,k8s版本,docker版本,k8s镜像,配置。集群甲顺利部署,集群乙部署后,网络组件异常,容器内文件操作权限不足导致异常退出。目前排查到差别就是集群乙已经创建了业务用户(uid1000),这个会有影响吗?单独docker run那个镜像,在里面操作文件内问题,应该是k8s这边。怀疑selinux,但也关了。
·
两套集群,同样的云平台,操作系统版本,k8s版本,docker版本,k8s镜像,配置。集群甲顺利部署,集群乙部署后,网络组件异常,容器内文件操作权限不足导致异常退出。目前排查到差别就是集群乙已经创建了业务用户(uid1000),这个会有影响吗?
单独docker run那个镜像,在里面操作文件内问题,应该是k8s这边。怀疑selinux,但也关了。
集群甲组件情况
[root@kfzx-yyfwq01 ~]# uname -a
Linux kfzx-yyfwq01 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
[root@kfzx-yyfwq01 ~]# kubectl get pod -A
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
default rocketmq-operator-6f54dbb5d8-xbmmn 0/1 CrashLoopBackOff 185 16h 10.244.41.135 worker1
ingress-controller ingress-nginx-admission-create-tfdwt 0/1 Completed 0 20h 10.244.61.2 worker2
ingress-controller ingress-nginx-admission-patch-gxfm4 0/1 Completed 2 20h 10.244.61.1 worker2
ingress-controller ingress-nginx-controller-dz4sb 0/1 CrashLoopBackOff 138 11h 192.168.9.47 worker3
ingress-controller ingress-nginx-controller-nnsrv 0/1 CrashLoopBackOff 138 11h 192.168.9.45 worker1
ingress-controller ingress-nginx-controller-pzhzg 0/1 CrashLoopBackOff 138 11h 192.168.9.46 worker2
kube-system calico-kube-controllers-6ff5c664c4-5jk68 0/1 CrashLoopBackOff 219 13h 10.244.61.7 worker2
kube-system calico-node-8qbps 1/1 Running 0 20h 192.168.9.45 worker1
kube-system calico-node-bwwhn 1/1 Running 3 20h 192.168.9.46 worker2
kube-system calico-node-xr285 1/1 Running 3 20h 192.168.9.47 worker3
kube-system calico-typha-6576ff658-5wtqq 1/1 Running 1 20h 192.168.9.46 worker2
kube-system calico-typha-6576ff658-626vr 1/1 Running 1 20h 192.168.9.47 worker3
kube-system calicoctl-hdfp2 1/1 Running 1 20h 192.168.9.46 worker2
kube-system calicoctl-l9wk5 1/1 Running 0 20h 192.168.9.45 worker1
kube-system calicoctl-m75qj 1/1 Running 1 20h 192.168.9.47 worker3
[root@kfzx-yyfwq01 ~]# kdesc ingress-nginx-controller|grep Image
Image: registry.aliyuncs.com/kubeadm-ha/ingress-nginx_controller:v0.47.0
Image ID: docker-pullable://registry.aliyuncs.com/kubeadm-ha/ingress-nginx_controller@sha256:a1e4efc107be0bb78f32eaec37bef17d7a0c81bec8066cdf2572508d21351d0b
[root@kfzx-yyfwq01 ~]# kdesc calico-kube-controllers|grep Image
Image: registry.aliyuncs.com/kubeadm-ha/calico_kube-controllers:v3.19.1
Image ID: docker-pullable://registry.aliyuncs.com/kubeadm-ha/calico_kube-controllers@sha256:904458fe1bd56f995ef76e2c4d9a6831c506cc80f79e8fc0182dc059b1db25a4
集群乙组件情况
[root@ytzn-saas06 ~]# uname -a
Linux ytzn-saas06 3.10.0-1160.el7.x86_64 #1 SMP Mon Oct 19 16:18:59 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
# kubectl get pod -A
ingress-controller ingress-nginx-controller-4k7kq 1/1 Running 0 105d 192.168.9.73 worker1
ingress-controller ingress-nginx-controller-cz6jv 1/1 Running 0 105d 192.168.9.75 worker3
ingress-controller ingress-nginx-controller-pzgb4 1/1 Running 0 105d 192.168.9.74 worker2
kube-system calico-kube-controllers-6d75fbc96d-cgq4g 1/1 Running 6 105d 10.244.52.10 master2
kube-system calico-node-2nllh 1/1 Running 0 105d 192.168.9.73 worker1
kube-system calico-node-8jr77 1/1 Running 1 105d 192.168.9.70 master1
kube-system calico-node-hqb6b 1/1 Running 21 105d 192.168.9.75 worker3
kube-system calico-node-kcg44 1/1 Running 0 105d 192.168.9.71 master2
kube-system calico-node-nlnmd 1/1 Running 0 105d 192.168.9.72 master3
kube-system calico-node-thzw2 1/1 Running 0 105d 192.168.9.74 worker2
kube-system calico-typha-6576ff658-8jr9n 1/1 Running 0 33d 192.168.9.75 worker3
kube-system calico-typha-6576ff658-d5h8z 1/1 Running 0 105d 192.168.9.74 worker2
kube-system calicoctl-8jzgt 1/1 Running 1 105d 192.168.9.70 master1
kube-system calicoctl-hqp24 1/1 Running 0 105d 192.168.9.73 worker1
[root@ytzn-saas06 ~]# kdesc ingress-nginx-controller-4k7kq|grep Image
Image: registry.aliyuncs.com/kubeadm-ha/ingress-nginx_controller:v0.47.0
Image ID: docker-pullable://registry.aliyuncs.com/kubeadm-ha/ingress-nginx_controller@sha256:a1e4efc107be0bb78f32eaec37bef17d7a0c81bec8066cdf2572508d21351d0b
[root@ytzn-saas06 ~]# kdesc calico-kube-controllers|grep Image
Image: registry.aliyuncs.com/kubeadm-ha/calico_kube-controllers:v3.19.1
Image ID: docker-pullable://registry.aliyuncs.com/kubeadm-ha/calico_kube-controllers@sha256:904458fe1bd56f995ef76e2c4d9a6831c506cc80f79e8fc0182dc059b1db25a4
ingress-nginx-controller 日志
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v0.47.0
Build: 7201e37633485d1f14dbe9cd7b22dd380df00a07
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.20.1
-------------------------------------------------------------------------------
I0329 00:31:51.771283 7 flags.go:208] "Watching for Ingress" class="nginx"
W0329 00:31:51.771341 7 flags.go:213] Ingresses with an empty class will also be processed by this Ingress controller
W0329 00:31:51.771634 7 client_config.go:614] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0329 00:31:51.771862 7 main.go:241] "Creating API client" host="https://10.244.64.1:443"
I0329 00:31:51.789762 7 main.go:285] "Running in Kubernetes cluster" major="1" minor="21" git="v1.21.5" state="clean" commit="aea7bbadd2fc0cd689de94a54e5b7b758869d691" platform="linux/amd64"
F0329 00:31:51.962345 7 ssl.go:389] unexpected error storing fake SSL Cert: could not create PEM certificate file /etc/ingress-controller/ssl/default-fake-certificate.pem: open /etc/ingress-controller/ssl/default-fake-certificate.pem: permission denied
goroutine 1 [running]:
k8s.io/klog/v2.stacks(0xc00000e001, 0xc0001b4200, 0x103, 0x1e1)
k8s.io/klog/v2@v2.4.0/klog.go:1026 +0xb9
calico-kube-controllers 日志
2023-03-29 00:31:29.141 [INFO][1] main.go 92: Loaded configuration from environment config=&config.Config{LogLevel:"info", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", DatastoreType:"kubernetes"}
W0329 00:31:29.142583 1 client_config.go:615] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2023-03-29 00:31:29.143 [INFO][1] main.go 113: Ensuring Calico datastore is initialized
2023-03-29 00:31:29.157 [INFO][1] main.go 153: Getting initial config snapshot from datastore
2023-03-29 00:31:29.173 [INFO][1] main.go 156: Got initial config snapshot
2023-03-29 00:31:29.173 [INFO][1] watchersyncer.go 89: Start called
2023-03-29 00:31:29.173 [INFO][1] main.go 173: Starting status report routine
2023-03-29 00:31:29.173 [INFO][1] main.go 182: Starting Prometheus metrics server on port 9094
2023-03-29 00:31:29.173 [INFO][1] main.go 418: Starting controller ControllerType="Node"
2023-03-29 00:31:29.173 [INFO][1] node_controller.go 143: Starting Node controller
2023-03-29 00:31:29.173 [INFO][1] watchersyncer.go 127: Sending status update Status=wait-for-ready
2023-03-29 00:31:29.173 [ERROR][1] status.go 138: Failed to write readiness file: open /status/status.json: permission denied
2023-03-29 00:31:29.173 [WARNING][1] status.go 66: Failed to write status error=open /status/status.json: permission denied
日志都指向无法写入文件这个权限不足问题
解决方案
重装k8s集群后故障消失(注:从1.21.5改为1.21.14),原有不明
更多推荐
已为社区贡献3条内容
所有评论(0)