k8s(五)集群常用的10个调试命令
kubectl version --shortkubectl cluster-infokubectl get componentstatuskubectl api-resources -o wide --sort-by namekubectl get events -Akubectl get nodes -o widekubectl get pods -A -o widekubectl run a
kubectl version --short
kubectl cluster-info
kubectl get componentstatus
kubectl api-resources -o wide --sort-by name
kubectl get events -A
kubectl get nodes -o wide
kubectl get pods -A -o wide
kubectl describe pod
kubectl logs
kubectl exec -it
1、kubectl version --short
# kubectl version --short
Client Version: v1.21.0
Server Version: v1.21.0
使用该命令,查看正在运行的服务器版本。可以帮助我们搜索错误和阅读变更日志,及组件之间是否存在版本兼容性问题。
2、kubectl cluster-info
# kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:6443
CoreDNS is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
KubeDNSUpstream is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/kube-dns-upstream:dns/proxy
kubernetes-dashboard is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
Metrics-server is running at https://127.0.0.1:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
接下来我们应该了解集群在哪里运行及CoreDNS是否在运行。
从事例中可以看到为本地集群;运行有dashboard仪表盘;有资源指标获取工具:metrics-server。
可以登陆dashboard中进一步查看集群状况
3、kubectl get componentstatus
# kubectl get componentstatus
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-2 Healthy {"health":"true"}
etcd-1 Healthy {"health":"true"}
etcd-0 Healthy {"health":"true"}
此命令可查看调度程序、控制器管理器、etcd节点是否健康。
其它健康检查命令 kubectl get --raw '/healthz?verbose'
# kubectl get --raw '/healthz?verbose'
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
healthz check passed
4、kubectl api-resources -o wide --sort-by name
# kubectl api-resources -o wide --sort-by name
NAME SHORTNAMES APIVERSION NAMESPACED KIND VERBS
alertmanagerconfigs monitoring.coreos.com/v1alpha1 true AlertmanagerConfig [delete deletecollection get list patch create update watch]
alertmanagers monitoring.coreos.com/v1 true Alertmanager [delete deletecollection get list patch create update watch]
apiservices apiregistration.k8s.io/v1 false APIService [create delete deletecollection get list patch update watch]
bindings v1 true Binding [create]
certificatesigningrequests csr certificates.k8s.io/v1 false CertificateSigningRequest [create delete deletecollection get list patch update watch]
clusterrolebindings rbac.authorization.k8s.io/v1 false ClusterRoleBinding [create delete deletecollection get list patch update watch]
clusterroles rbac.authorization.k8s.io/v1 false ClusterRole [create delete deletecollection get list patch update watch]
componentstatuses cs v1 false ComponentStatus [get list]
configmaps cm v1 true ConfigMap [create delete deletecollection get list patch update watch]
controllerrevisions apps/v1 true ControllerRevision [create delete deletecollection get list patch update watch]
cronjobs cj batch/v1 true CronJob [create delete deletecollection get list patch update watch]
csidrivers storage.k8s.io/v1 false CSIDriver [create delete deletecollection get list patch update watch]
csinodes storage.k8s.io/v1 false CSINode [create delete deletecollection get list patch update watch]
csistoragecapacities storage.k8s.io/v1beta1 true CSIStorageCapacity [create delete deletecollection get list patch update watch]
customresourcedefinitions crd,crds apiextensions.k8s.io/v1 false CustomResourceDefinition [create delete deletecollection get list patch update watch]
daemonsets ds apps/v1 true DaemonSet [create delete deletecollection get list patch update watch]
deployments deploy apps/v1 true Deployment [create delete deletecollection get list patch update watch]
endpoints ep v1 true Endpoints [create delete deletecollection get list patch update watch]
endpointslices discovery.k8s.io/v1 true EndpointSlice [create delete deletecollection get list patch update watch]
events ev v1 true Event [create delete deletecollection get list patch update watch]
events ev events.k8s.io/v1 true Event [create delete deletecollection get list patch update watch]
flowschemas flowcontrol.apiserver.k8s.io/v1beta1 false FlowSchema [create delete deletecollection get list patch update watch]
horizontalpodautoscalers hpa autoscaling/v1 true HorizontalPodAutoscaler [create delete deletecollection get list patch update watch]
ingressclasses networking.k8s.io/v1 false IngressClass [create delete deletecollection get list patch update watch]
ingresses ing networking.k8s.io/v1 true Ingress [create delete deletecollection get list patch update watch]
ingresses ing extensions/v1beta1 true Ingress [create delete deletecollection get list patch update watch]
jobs batch/v1 true Job [create delete deletecollection get list patch update watch]
leases coordination.k8s.io/v1 true Lease [create delete deletecollection get list patch update watch]
limitranges limits v1 true LimitRange [create delete deletecollection get list patch update watch]
localsubjectaccessreviews authorization.k8s.io/v1 true LocalSubjectAccessReview [create]
mutatingwebhookconfigurations admissionregistration.k8s.io/v1 false MutatingWebhookConfiguration [create delete deletecollection get list patch update watch]
namespaces ns v1 false Namespace [create delete get list patch update watch]
networkpolicies netpol networking.k8s.io/v1 true NetworkPolicy [create delete deletecollection get list patch update watch]
nodes no v1 false Node [create delete deletecollection get list patch update watch]
nodes metrics.k8s.io/v1beta1 false NodeMetrics [get list]
persistentvolumeclaims pvc v1 true PersistentVolumeClaim [create delete deletecollection get list patch update watch]
persistentvolumes pv v1 false PersistentVolume [create delete deletecollection get list patch update watch]
poddisruptionbudgets pdb policy/v1 true PodDisruptionBudget [create delete deletecollection get list patch update watch]
podmonitors monitoring.coreos.com/v1 true PodMonitor [delete deletecollection get list patch create update watch]
pods po v1 true Pod [create delete deletecollection get list patch update watch]
pods metrics.k8s.io/v1beta1 true PodMetrics [get list]
podsecuritypolicies psp policy/v1beta1 false PodSecurityPolicy [create delete deletecollection get list patch update watch]
podtemplates v1 true PodTemplate [create delete deletecollection get list patch update watch]
priorityclasses pc scheduling.k8s.io/v1 false PriorityClass [create delete deletecollection get list patch update watch]
prioritylevelconfigurations flowcontrol.apiserver.k8s.io/v1beta1 false PriorityLevelConfiguration [create delete deletecollection get list patch update watch]
probes monitoring.coreos.com/v1 true Probe [delete deletecollection get list patch create update watch]
prometheuses monitoring.coreos.com/v1 true Prometheus [delete deletecollection get list patch create update watch]
prometheusrules monitoring.coreos.com/v1 true PrometheusRule [delete deletecollection get list patch create update watch]
replicasets rs apps/v1 true ReplicaSet [create delete deletecollection get list patch update watch]
replicationcontrollers rc v1 true ReplicationController [create delete deletecollection get list patch update watch]
resourcequotas quota v1 true ResourceQuota [create delete deletecollection get list patch update watch]
rolebindings rbac.authorization.k8s.io/v1 true RoleBinding [create delete deletecollection get list patch update watch]
roles rbac.authorization.k8s.io/v1 true Role [create delete deletecollection get list patch update watch]
runtimeclasses node.k8s.io/v1 false RuntimeClass [create delete deletecollection get list patch update watch]
secrets v1 true Secret [create delete deletecollection get list patch update watch]
selfsubjectaccessreviews authorization.k8s.io/v1 false SelfSubjectAccessReview [create]
selfsubjectrulesreviews authorization.k8s.io/v1 false SelfSubjectRulesReview [create]
serviceaccounts sa v1 true ServiceAccount [create delete deletecollection get list patch update watch]
servicemonitors monitoring.coreos.com/v1 true ServiceMonitor [delete deletecollection get list patch create update watch]
services svc v1 true Service [create delete get list patch update watch]
statefulsets sts apps/v1 true StatefulSet [create delete deletecollection get list patch update watch]
storageclasses sc storage.k8s.io/v1 false StorageClass [create delete deletecollection get list patch update watch]
subjectaccessreviews authorization.k8s.io/v1 false SubjectAccessReview [create]
thanosrulers monitoring.coreos.com/v1 true ThanosRuler [delete deletecollection get list patch create update watch]
tokenreviews authentication.k8s.io/v1 false TokenReview [create]
validatingwebhookconfigurations admissionregistration.k8s.io/v1 false ValidatingWebhookConfiguration [create delete deletecollection get list patch update watch]
volumeattachments storage.k8s.io/v1 false VolumeAttachment [create delete deletecollection get list patch update watch]
--sort-by name 按名称排序
-o wide 显示每个资源上可用的动作(create delete deletecollection get list patch update watch)
此命令可以查看集群安装哪些自定义资源以及每个资源的api版本
5、kubectl get events -A
# kubectl get events -A
NAMESPACE LAST SEEN TYPE REASON OBJECT MESSAGE
default 75m Normal RegisteredNode node/10.2.33.100 Node 10.2.33.100 event: Registered Node 10.2.33.100 in Controller
default 51m Normal RegisteredNode node/10.2.33.100 Node 10.2.33.100 event: Registered Node 10.2.33.100 in Controller
default 28m Normal RegisteredNode node/10.2.33.100 Node 10.2.33.100 event: Registered Node 10.2.33.100 in Controller
default 5m38s Normal RegisteredNode node/10.2.33.100 Node 10.2.33.100 event: Registered Node 10.2.33.100 in Controller
default 75m Normal RegisteredNode node/10.2.33.101 Node 10.2.33.101 event: Registered Node 10.2.33.101 in Controller
default 51m Normal RegisteredNode node/10.2.33.101 Node 10.2.33.101 event: Registered Node 10.2.33.101 in Controller
default 28m Normal RegisteredNode node/10.2.33.101 Node 10.2.33.101 event: Registered Node 10.2.33.101 in Controller
default 5m38s Normal RegisteredNode node/10.2.33.101 Node 10.2.33.101 event: Registered Node 10.2.33.101 in Controller
default 75m Normal RegisteredNode node/10.2.33.94 Node 10.2.33.94 event: Registered Node 10.2.33.94 in Controller
default 51m Normal RegisteredNode node/10.2.33.94 Node 10.2.33.94 event: Registered Node 10.2.33.94 in Controller
default 28m Normal RegisteredNode node/10.2.33.94 Node 10.2.33.94 event: Registered Node 10.2.33.94 in Controller
default 5m38s Normal RegisteredNode node/10.2.33.94 Node 10.2.33.94 event: Registered Node 10.2.33.94 in Controller
default 75m Normal RegisteredNode node/10.2.33.95 Node 10.2.33.95 event: Registered Node 10.2.33.95 in Controller
default 51m Normal RegisteredNode node/10.2.33.95 Node 10.2.33.95 event: Registered Node 10.2.33.95 in Controller
default 28m Normal RegisteredNode node/10.2.33.95 Node 10.2.33.95 event: Registered Node 10.2.33.95 in Controller
default 5m38s Normal RegisteredNode node/10.2.33.95 Node 10.2.33.95 event: Registered Node 10.2.33.95 in Controller
default 75m Normal RegisteredNode node/10.2.33.96 Node 10.2.33.96 event: Registered Node 10.2.33.96 in Controller
default 51m Normal RegisteredNode node/10.2.33.96 Node 10.2.33.96 event: Registered Node 10.2.33.96 in Controller
default 28m Normal RegisteredNode node/10.2.33.96 Node 10.2.33.96 event: Registered Node 10.2.33.96 in Controller
default 5m38s Normal RegisteredNode node/10.2.33.96 Node 10.2.33.96 event: Registered Node 10.2.33.96 in Controller
default 75m Normal RegisteredNode node/10.2.33.97 Node 10.2.33.97 event: Registered Node 10.2.33.97 in Controller
default 51m Normal RegisteredNode node/10.2.33.97 Node 10.2.33.97 event: Registered Node 10.2.33.97 in Controller
default 28m Normal RegisteredNode node/10.2.33.97 Node 10.2.33.97 event: Registered Node 10.2.33.97 in Controller
default 5m38s Normal RegisteredNode node/10.2.33.97 Node 10.2.33.97 event: Registered Node 10.2.33.97 in Controller
default 75m Normal RegisteredNode node/10.2.33.98 Node 10.2.33.98 event: Registered Node 10.2.33.98 in Controller
default 51m Normal RegisteredNode node/10.2.33.98 Node 10.2.33.98 event: Registered Node 10.2.33.98 in Controller
default 28m Normal RegisteredNode node/10.2.33.98 Node 10.2.33.98 event: Registered Node 10.2.33.98 in Controller
default 5m38s Normal RegisteredNode node/10.2.33.98 Node 10.2.33.98 event: Registered Node 10.2.33.98 in Controller
default 75m Normal RegisteredNode node/10.2.33.99 Node 10.2.33.99 event: Registered Node 10.2.33.99 in Controller
default 51m Normal RegisteredNode node/10.2.33.99 Node 10.2.33.99 event: Registered Node 10.2.33.99 in Controller
default 28m Normal RegisteredNode node/10.2.33.99 Node 10.2.33.99 event: Registered Node 10.2.33.99 in Controller
default 5m38s Normal RegisteredNode node/10.2.33.99 Node 10.2.33.99 event: Registered Node 10.2.33.99 in Controller
kube-system 75m Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_6caa59c9-8c2e-435d-99d4-d59f46704b5b became leader
kube-system 52m Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_5d324cbb-3752-4e07-a742-1dcdf8781ef6 became leader
kube-system 34m Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_5bfac8da-8530-407b-a715-b42f24c3070f became leader
kube-system 28m Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_5facaf45-85c8-4750-b9da-42c933a8ff38 became leader
kube-system 5m47s Normal LeaderElection endpoints/k8s-sigs.io-nfs-subdir-external-provisioner nfs-client-provisioner-5db449f657-97jsh_5bdd527d-effe-49ce-8164-973aeb18417d became leader
kube-system 75m Normal LeaderElection lease/kube-controller-manager localhost.localdomain_3e4b128c-a93c-4ec8-b0bb-ee2cbead1b15 became leader
kube-system 52m Normal LeaderElection lease/kube-controller-manager localhost.localdomain_b915f124-2b75-41b3-8925-856538c2dc83 became leader
kube-system 28m Normal LeaderElection lease/kube-controller-manager localhost.localdomain_5170c3fa-e369-4962-96e1-6624ea248a65 became leader
kube-system 5m50s Normal LeaderElection lease/kube-controller-manager localhost.localdomain_04429131-7c12-440a-8adb-592ce6f8f933 became leader
kube-system 75m Normal LeaderElection lease/kube-scheduler localhost.localdomain_dad4a51f-4954-4a78-9fbd-2147778743ef became leader
kube-system 52m Normal LeaderElection lease/kube-scheduler localhost.localdomain_130dba72-4f71-44b5-aed1-7f1677c6d243 became leader
kube-system 35m Normal LeaderElection lease/kube-scheduler localhost.localdomain_30d37560-8849-4128-a713-137a686d2a97 became leader
kube-system 29m Normal LeaderElection lease/kube-scheduler localhost.localdomain_e0fb2509-c923-4b12-9002-4cfb6d53833d became leader
kube-system 5m47s Normal LeaderElection lease/kube-scheduler localhost.localdomain_5ae8cd66-a2bc-48b1-abba-a62c3fe2a2ff became leader
kube-system 6m5s Normal Pulled pod/nfs-client-provisioner-5db449f657-97jsh Container image "easzlab/nfs-subdir-external-provisioner:v4.0.1" already present on machine
kube-system 6m5s Normal Created pod/nfs-client-provisioner-5db449f657-97jsh Created container nfs-client-provisioner
kube-system 6m5s Normal Started pod/nfs-client-provisioner-5db449f657-97jsh Started container nfs-client-provisioner
kube-system 29m Warning BackOff pod/nfs-client-provisioner-5db449f657-97jsh Back-off restarting failed container
了解了集群中正在运行的内容后,查看最近出现的故障,可以从集群事件中了解故障前后发生的情况,如果查看指定namespace出现的问题,用-n 指定即可。
通过输出,应关注输出的类型、原因和对象。通过这三条,缩小查找范围
6、kubectl get nodes -o wide
# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
10.2.33.100 Ready,SchedulingDisabled master 40d v1.21.0 10.2.33.100 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5
10.2.33.101 Ready,SchedulingDisabled master 39d v1.21.0 10.2.33.101 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5
10.2.33.94 Ready node 39d v1.21.0 10.2.33.94 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5
10.2.33.95 Ready node 40d v1.21.0 10.2.33.95 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5
10.2.33.96 Ready node 40d v1.21.0 10.2.33.96 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5
10.2.33.97 Ready node 40d v1.21.0 10.2.33.97 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5
10.2.33.98 Ready,SchedulingDisabled master 40d v1.21.0 10.2.33.98 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5
10.2.33.99 Ready,SchedulingDisabled master 40d v1.21.0 10.2.33.99 <none> CentOS Linux 7 (Core) 3.10.0-862.el7.x86_64 docker://20.10.5
可以查看各节点Pod运行的状态。使用-o wide 详细显示如操作系统(os)、ip地址等。查看节点是否为Ready状态。
查看节点的运行时长,以查看状态和运行时长之间是否存在任何相关性。也许只有新节点有问题,因为节点镜像中的某些内容发生了变化。该版本将帮助你快速了解 kubelet 上是否存在版本偏差,以及是否存在由于 kubelet 和 API 服务器之间的版本不同而导致的已知错误。
如果你看到子网之外的 IP 地址,则内部 IP 会很有用。一个节点可能以不正确的静态 IP 地址启动,并且你的 CNI 无法将流量路由到工作负载。
操作系统镜像、内核版本和容器运行时都是可能导致问题的差异的重要指标。你可能只遇到特定操作系统或运行时的问题。此信息将帮助你快速将潜在问题归零,并知道在何处更深入地查看日志。
7、kubectl get pods -A -o wide
# kubectl get pods -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default test-pod 0/1 Completed 0 28d 172.20.6.3 10.2.33.94 <none> <none>
ingress-nginx nginx-ingress-controller-6f4c78bb57-fshh7 1/1 Running 5 40d 172.20.4.3 10.2.33.96 <none> <none>
kube-system coredns-74c56d8f8d-k8ggm 1/1 Running 0 40d 172.20.4.2 10.2.33.96 <none> <none>
kube-system dashboard-metrics-scraper-856586f554-fft8q 1/1 Running 1 40d 172.20.5.2 10.2.33.97 <none> <none>
kube-system kube-flannel-ds-amd64-7fmqs 1/1 Running 0 39d 10.2.33.94 10.2.33.94 <none> <none>
kube-system kube-flannel-ds-amd64-cvt77 1/1 Running 0 40d 10.2.33.97 10.2.33.97 <none> <none>
kube-system kube-flannel-ds-amd64-d5rzz 1/1 Running 0 40d 10.2.33.100 10.2.33.100 <none> <none>
kube-system kube-flannel-ds-amd64-gncjz 1/1 Running 0 39d 10.2.33.101 10.2.33.101 <none> <none>
kube-system kube-flannel-ds-amd64-jfkx2 1/1 Running 0 40d 10.2.33.96 10.2.33.96 <none> <none>
kube-system kube-flannel-ds-amd64-mltxw 1/1 Running 0 40d 10.2.33.95 10.2.33.95 <none> <none>
kube-system kube-flannel-ds-amd64-vlghf 1/1 Running 0 40d 10.2.33.99 10.2.33.99 <none> <none>
kube-system kube-flannel-ds-amd64-xmzz7 1/1 Running 0 40d 10.2.33.98 10.2.33.98 <none> <none>
kube-system kubernetes-dashboard-c4ff5556c-pcmtw 1/1 Running 31 40d 172.20.5.3 10.2.33.97 <none> <none>
kube-system metrics-server-8568cf894b-4925q 1/1 Running 0 40d 172.20.3.2 10.2.33.95 <none> <none>
kube-system nfs-client-provisioner-5db449f657-97jsh 1/1 Running 714 28d 172.20.6.2 10.2.33.94 <none> <none>
kube-system node-local-dns-4bcdm 1/1 Running 0 40d 10.2.33.98 10.2.33.98 <none> <none>
kube-system node-local-dns-bq5j5 1/1 Running 0 40d 10.2.33.97 10.2.33.97 <none> <none>
kube-system node-local-dns-d6xr5 1/1 Running 0 40d 10.2.33.100 10.2.33.100 <none> <none>
kube-system node-local-dns-hlc7t 1/1 Running 0 39d 10.2.33.101 10.2.33.101 <none> <none>
kube-system node-local-dns-k9lqg 1/1 Running 0 40d 10.2.33.96 10.2.33.96 <none> <none>
kube-system node-local-dns-ntf59 1/1 Running 0 40d 10.2.33.99 10.2.33.99 <none> <none>
kube-system node-local-dns-qs6rw 1/1 Running 0 39d 10.2.33.94 10.2.33.94 <none> <none>
kube-system node-local-dns-qxt7m 1/1 Running 0 40d 10.2.33.95 10.2.33.95 <none> <none>
monitor alertmanager-prometheus-kube-prometheus-alertmanager-0 2/2 Running 0 18d 172.20.5.5 10.2.33.97 <none> <none>
monitor prometheus-grafana-55c5f574d9-sgvr4 2/2 Running 0 18d 172.20.6.5 10.2.33.94 <none> <none>
monitor prometheus-kube-prometheus-operator-5f6774b747-zvffc 1/1 Running 0 18d 172.20.6.6 10.2.33.94 <none> <none>
monitor prometheus-kube-state-metrics-5f89586745-lfwr2 1/1 Running 0 18d 172.20.3.5 10.2.33.95 <none> <none>
monitor prometheus-prometheus-kube-prometheus-prometheus-0 2/2 Running 0 18d 172.20.4.4 10.2.33.96 <none> <none>
monitor prometheus-prometheus-node-exporter-2cccr 1/1 Running 0 18d 10.2.33.95 10.2.33.95 <none> <none>
monitor prometheus-prometheus-node-exporter-66r8q 1/1 Running 0 18d 10.2.33.98 10.2.33.98 <none> <none>
monitor prometheus-prometheus-node-exporter-86x9l 1/1 Running 0 18d 10.2.33.100 10.2.33.100 <none> <none>
monitor prometheus-prometheus-node-exporter-f8mpk 1/1 Running 0 18d 10.2.33.94 10.2.33.94 <none> <none>
monitor prometheus-prometheus-node-exporter-g8mng 1/1 Running 0 18d 10.2.33.99 10.2.33.99 <none> <none>
monitor prometheus-prometheus-node-exporter-k5r2j 1/1 Running 0 18d 10.2.33.101 10.2.33.101 <none> <none>
monitor prometheus-prometheus-node-exporter-pjbl5 1/1 Running 0 18d 10.2.33.97 10.2.33.97 <none> <none>
monitor prometheus-prometheus-node-exporter-s7z8c 1/1 Running 0 18d 10.2.33.96 10.2.33.96 <none> <none>
test-tengine plat-tengine-649d486499-w68bx 1/1 Running 0 40d 172.20.3.4 10.2.33.95 <none> <none>
-A 列出所有节点,-o wide 显示详细信息。如果查看指定namespace出现的问题,用-n 指定即可。
根据列出的节点状态(STATUS)定位是哪个namespace或node上出现问题
8、kubectl describe plat-tengine-649d486499-w68bx -n test-tengine
# kubectl describe pod plat-tengine-649d486499-w68bx -n test-tengine
Name: plat-tengine-649d486499-w68bx
Namespace: test-tengine
Priority: 0
Node: 10.2.33.95/10.2.33.95
Start Time: Wed, 20 Oct 2021 17:54:50 +0800
Labels: app=tengine-labels
pod-template-hash=649d486499
Annotations: <none>
Status: Running
IP: 172.20.3.4
IPs:
IP: 172.20.3.4
Controlled By: ReplicaSet/plat-tengine-649d486499
Containers:
plat-tengine:
Container ID: docker://7f1240b861a15d7011ab8a40285a46441179657c8946e1900be259e16c38e080
Image: registry.tengine.tv/zxltest/tengine:25
Image ID: docker-pullable://registry.tengine.tv/zxltest/tengine@sha256:441d952bcf039c6921b0f860ae1bc86159b9ef8a2368f7964b7f88d643f82e5f
Port: 8108/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 20 Oct 2021 17:54:52 +0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/etc/localtime from plat-time (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hdrgk (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
plat-time:
Type: HostPath (bare host directory volume)
Path: /usr/share/zoneinfo/Asia/Shanghai
HostPathType:
kube-api-access-hdrgk:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
既然已经知道哪个name及在哪个namespace出现了问题,就直接查看该pod的详细信息。
查看是否有error信息,对应解决问题。
9、 kubectl logs -f plat-tengine-649d486499-w68bx -n test-tengine
# kubectl logs -f plat-tengine-649d486499-w68bx -n test-tengine
2021-11-28 11:25:47,540 INFO exited: tengine (exit status 1; not expected)
2021-11-28 11:25:48,543 INFO spawned: 'tengine' with pid 19128
2021-11-28 11:25:49,554 INFO success: tengine entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-11-28 11:25:51,055 INFO exited: tengine (exit status 1; not expected)
2021-11-28 11:25:52,058 INFO spawned: 'tengine' with pid 19129
2021-11-28 11:25:53,069 INFO success: tengine entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2021-11-28 11:25:54,570 INFO exited: tengine (exit status 1; not expected)
2021-11-28 11:25:55,574 INFO spawned: 'tengine' with pid 19130
describe命令为你提供pod内部应用程序发生的时间,而logs则提供了pod相关的详细信息。
可以通过grep 过滤掉不相关的信息,或指定特定事件。
10、kubectl exec -it plat-tengine-649d486499-w68bx -n test-tengine /bin/bash
# kubectl exec -it plat-tengine-649d486499-w68bx -n test-tengine /bin/bash
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
[root@plat-tengine-649d486499-w68bx /]# free
total used free shared buff/cache available
Mem: 16257696 778408 10021860 279332 5457428 14775780
Swap: 0 0 0
# ps -ef
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Oct20 ? 01:21:38 /usr/bin/python /usr/bin/supervisord -n -c /etc/supervisord.conf
root 10 1 0 Oct20 ? 00:00:00 nginx: master process /data/server/tengine/bin/nginx -p /data/server/tengine -c /data/server/tengine/conf/tengin
www 11 10 0 Oct20 ? 00:15:08 nginx: worker process
www 12 10 0 Oct20 ? 00:00:00 nginx: worker process
www 13 10 0 Oct20 ? 00:15:15 nginx: worker process
www 14 10 0 Oct20 ? 00:15:15 nginx: worker process
www 15 10 0 Oct20 ? 00:15:34 nginx: worker process
www 16 10 0 Oct20 ? 00:15:09 nginx: worker process
www 17 10 0 Oct20 ? 00:15:18 nginx: worker process
www 18 10 0 Oct20 ? 00:15:29 nginx: worker process
root 22139 0 0 18:31 pts/0 00:00:00 /bin/bash
root 22170 1 0 18:32 ? 00:00:00 /data/server/tengine/bin/nginx -p /data/server/tengine -c /data/server/tengine/conf/tengine.conf -s start
root 22171 22139 0 18:32 pts/0 00:00:00 ps -ef
日志中如果不能查找到问题,这时就只能进入到容器内部查看进程及服务日志,来定位具体问题.
更多推荐
所有评论(0)