k8s部署metrics server资源监控及日志查看
如果状态为True且能够返回数据,说明MetricsServer运行正常。可以看到,本次部署失败了。结合上面的输出分析,应该是镜像拉取失败了,需要将yaml文件中的镜像下载地址替换为国内的镜像仓库地址。参数,告诉metricsserver不验证kubelet提供的https证书。kubelet组件使用systemd管理服务,查看日志的命令为。原因是我们下载的yaml文件是最新的,是基于。其他k8s
k8s部署metrics server资源监控及日志查看
查看资源的常用命令
kubectl get
查看资源信息
kubectl get <资源类型> <资源名称>
kubectl get <资源类型> <资源名称> -o wide #显示详细信息
kubectl get <资源类型> <资源名称> -o yaml #导出yaml文件配置
例如
kubectl get node
kubectl get node k8s-master
查看节点标签信息
#给节点打标签
[root@k8s-master ~]# kubectl label node k8s-node1 node-role.kubernetes.io/node1=
[root@k8s-master ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 33h v1.23.0
k8s-node1 Ready node1 33h v1.23.0
k8s-node2 Ready <none> 33h v1.23.0
[root@k8s-master ~]# kubectl get node k8s-master --show-labels
几个常用缩写
kubectl get cs # 查看control-manager和scheduler组件状态
kubectl get po # 相当于kubectl get pods
kubectl get svc # 相当于kubectl get service
查看集群中所有API资源信息
kubectl api-resources
kubectl describe
查看资源详细描述
kubectl describe <资源类型> <资源名称>
可以把kubectl get
与kubectl describe
结合使用:
[root@k8s-master ~]# kubectl get po
NAME READY STATUS RESTARTS AGE
web-demo1-5ff6d576bb-5292c 1/1 Running 0 4h4m
web-demo1-5ff6d576bb-92gqk 1/1 Running 0 4h4m
web-demo1-5ff6d576bb-d26cf 1/1 Running 0 4h4m
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get pod web-demo1-5ff6d576bb-5292c -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
web-demo1-5ff6d576bb-5292c 1/1 Running 0 4h10m 10.244.169.134 k8s-node2 <none> <none>
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl describe pod web-demo1-5ff6d576bb-5292c
Name: ...
Namespace: ...
Node: ...
Start Time: ...
Labels: ...
IP: ...
COntainers: ...
Conditions: ...
Volumes: ...
Tolerations: ...
[root@k8s-master ~]# kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane,master 34h v1.23.0
k8s-node1 Ready node1 34h v1.23.0
k8s-node2 Ready <none> 34h v1.23.0
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl get node k8s-node2 -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
k8s-node2 Ready <none> 34h v1.23.0 192.168.136.98 <none> CentOS Linux 7 (Core) 3.10.0-1160.el7.x86_64 docker://20.10.17
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl describe node k8s-node2
Name: ...
Roles: ...
Labels: ...
Taints: ...
Conditions: ...
Addresses: ...
Capacity:
cpu: ...
memory: ...
System Info: ...
PodCIDR: ...
Events: ...
使用kubectl top
命令可以监控集群资源利用率,但是需要先安装Metrics Server,否则会报错。
[root@k8s-master ~]# kubectl top node
error: Metrics API not available
[root@k8s-master ~]# kubectl top pod
error: Metrics API not available
Metrics Server部署
通过yaml文件部署metrics-server
下载yaml配置文件:
wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
修改配置文件components.yaml
,添加--kubelet-insecure-tls
参数,告诉metrics server不验证kubelet提供的https证书。
containers:
- args:
- --cert-dir=/tmp
- --secure-port=4443
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --kubelet-insecure-tls
部署Metrics Server
kubectl apply -f components.yaml
检查是否部署成功
[root@k8s-master ~]# kubectl get apiservices | grep metrics
v1beta1.metrics.k8s.io kube-system/metrics-server False (MissingEndpoints) 8m27s
[root@k8s-master ~]# kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request
如果状态为True且能够返回数据,说明Metrics Server运行正常。可以看到,本次部署失败了。
部署报错排查
查看镜像部署状态:
kubectl get po -n kube-system
NAME READY STATUS RESTARTS AGE
...
metrics-server-574849569f-svt2v 0/1 ImagePullBackOff 0 8m8s
查看Pod日志:
[root@k8s-master ~]# kubectl logs metrics-server-574849569f-svt2v -n kube-system
Error from server (BadRequest): container "metrics-server" in pod "metrics-server-574849569f-svt2v" is waiting to start: trying and failing to pull image
结合上面的输出分析,应该是镜像拉取失败了,需要将yaml文件中的镜像下载地址替换为国内的镜像仓库地址。
删除当前出错的Metrics Server部署:
kubectl delete -f components.yaml
替换镜像下载地址
替换镜像下载地址为
image: registry.cn-shenzhen.aliyuncs.com/zengfengjin/metrics-server:v0.5.0
重新部署
kubectl apply -f components.yaml
安装成功,Pod状态变为Running,但是一直没有Ready。
[root@k8s-master ~]# kubectl get deployment,po -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/calico-kube-controllers 1/1 1 1 35h
deployment.apps/coredns 2/2 2 2 35h
deployment.apps/metrics-server 0/1 1 0 87s
NAME READY STATUS RESTARTS AGE
...
pod/metrics-server-798c598bb8-rv827 0/1 Running 0 87s
检查Metrics Server的Pod日志:
[root@k8s-master ~]# kubectl logs metrics-server-798c598bb8-rv827 -n kube-system
E0724 14:26:19.149545 1 scraper.go:139] "Failed to scrape node" err="GET \"https://192.168.x.x:10250/stats/summary?only_cpu_and_memory=true\": bad status code \"403 Forbidden\"" node="k8s-node1"
[root@k8s-master ~]# kubectl describe pod metrics-server-798c598bb8-rv827 -n kube-system
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 6m25s default-scheduler Successfully assigned kube-system/metrics-server-798c598bb8-rv827 to k8s-node2
Normal Pulling 6m24s kubelet Pulling image "registry.cn-shenzhen.aliyuncs.com/zengfengjin/metrics-server:v0.5.0"
Normal Pulled 5m51s kubelet Successfully pulled image "registry.cn-shenzhen.aliyuncs.com/zengfengjin/metrics-server:v0.5.0" in 32.739844006s
Normal Created 5m51s kubelet Created container metrics-server
Normal Started 5m51s kubelet Started container metrics-server
Warning Unhealthy 75s (x29 over 5m25s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
注意到以下GET请求被拒绝(403 Forbidden)
GET \"https://192.168.x.x:10250/stats
原因是我们下载的yaml文件是最新的,是基于0.6.x写的,而镜像下载地址被手动改成了0.5.x。在Metrics Server中,0.5.x的配置文件中需要的权限配置与0.6.x不一样。0.5.x中需要对
nodes/stats
的访问权限,但是0.6.x中改成了nodes/metrics
。
删除当前出错的Metrics Server部署:
kubectl delete -f components.yaml
metrics-server资源访问权限修改
检查配置文件中metrics-server角色权限:
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- apiGroups:
- ""
resources:
- pods
- nodes
verbs:
- get
- list
- watch
增加访问nodes/stats
的权限,修改为
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
k8s-app: metrics-server
name: system:metrics-server
rules:
- apiGroups:
- ""
resources:
- nodes/metrics
verbs:
- get
- apiGroups:
- ""
resources:
- nodes/stats
- pods
- nodes
verbs:
- get
- list
- watch
重新部署
kubectl apply -f components.yaml
[root@k8s-master ~]# kubectl get deployment,po -n kube-system
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/calico-kube-controllers 1/1 1 1 35h
deployment.apps/coredns 2/2 2 2 36h
deployment.apps/metrics-server 1/1 1 1 37s
NAME READY STATUS RESTARTS AGE
pod/metrics-server-798c598bb8-j69cc 1/1 Running 0 36s
检查部署是否成功
[root@k8s-master ~]# kubectl get apiservices | grep metrics
v1beta1.metrics.k8s.io kube-system/metrics-server True 80s
[root@k8s-master ~]# kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes
{"kind":"NodeMetricsList","apiVersion":"metrics.k8s.io/v1beta1","metadata":{},"items":[{"metadata":{"name":"k8s-master","creationTimestamp":"2022-07-24T15:06:31Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"k8s-master","kubernetes.io/os":"linux","node-role.kubernetes.io/control-plane":"","node-role.kubernetes.io/master":"","node.kubernetes.io/exclude-from-external-load-balancers":""}},"timestamp":"2022-07-24T15:06:11Z","window":"10s","usage":{"cpu":"201981233n","memory":"1244120Ki"}},{"metadata":{"name":"k8s-node1","creationTimestamp":"2022-07-24T15:06:31Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"k8s-node1","kubernetes.io/os":"linux","node-role.kubernetes.io/node1":""}},"timestamp":"2022-07-24T15:06:14Z","window":"20s","usage":{"cpu":"86504224n","memory":"846880Ki"}},{"metadata":{"name":"k8s-node2","creationTimestamp":"2022-07-24T15:06:31Z","labels":{"beta.kubernetes.io/arch":"amd64","beta.kubernetes.io/os":"linux","kubernetes.io/arch":"amd64","kubernetes.io/hostname":"k8s-node2","kubernetes.io/os":"linux"}},"timestamp":"2022-07-24T15:06:10Z","window":"10s","usage":{"cpu":"77667531n","memory":"883624Ki"}}]}
查看资源监控
[root@k8s-master ~]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k8s-master 208m 10% 1217Mi 70%
k8s-node1 83m 4% 829Mi 48%
k8s-node2 80m 4% 864Mi 50%
[root@k8s-master ~]#
[root@k8s-master ~]# kubectl top pod
NAME CPU(cores) MEMORY(bytes)
web-demo1-5ff6d576bb-5292c 1m 5Mi
web-demo1-5ff6d576bb-92gqk 1m 7Mi
web-demo1-5ff6d576bb-d26cf 1m 3Mi
查看日志的常用命令
kubelet日志
kubelet组件使用systemd管理服务,查看日志的命令为
journalctl -u kubelet -f
或者查看系统日志
tail -f /var/log/messages
Pod日志
其他k8s组件采用容器部署,查看日志的命令为
kubectl get po -n <命名空间>
kubectl logs <Pod名称> -n <命名空间>
kubectl logs <Pod名称> -n <命名空间> -f #实时查看
标准输出在宿主机的路径为
/var/lib/docker/containers/<container-id>/<container-id>-json.log
查看容器ID的方法为
kubectl get pods -o wide
#到Pod所在节点查看容器ID
docker ps | grep web-demo1-5ff6d576bb-92gqk
也可以进入容器终端日志目录查看日志
kubectl exec -it <Pod名称> --bash
参考文章
【1】https://stackoverflow.com/questions/70362216/getting-error-while-implementing-metric-server-inside-the-kubernetes
【2】https://github.com/kubernetes-sigs/metrics-server/releases
更多推荐
所有评论(0)