k8s,prometheus grafana默认仪表盘配置 No data 。
node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
·
在Prometheus界面上通过PromQL查询,发现指标数据缺失container、image、name、namespace、pod等标签,如下:
查看cadvisor的原始数据,进一步验证了container、image、name、namespace、pod等标签的缺失,如下:
curl -k -H "Authorization: Bearer $TOKEN" https://10.6.128.7:10250/metrics/cadvisor
container_cpu_load_average_10s{container="",id="/",image="",name="",namespace="",pod=""} 0 1666834382282
container_cpu_load_average_10s{container="",id="/docker/5678922ca0bd7afc30b75ffa4ae5fb96298170c3f58a47ae335940b20cd6fa7b",image="",name="",namespace="",pod=""} 0 1666834372644
container_cpu_load_average_10s{container="",id="/kubepods",image="",name="",namespace="",pod=""} 0 1666834372281
container_cpu_load_average_10s{container="",id="/kubepods/besteffort",image="",name="",namespace="",pod=""} 0 1666834378893
container_cpu_load_average_10s{container="",id="/kubepods/besteffort/pod25a7ff7b-7058-4015-8f35-62b2b2a07035",i
[BUG, RKE1, Monitoring V2] RKE1 1.24 seems to be omitting relevant cadvisor container labels and metric series that break Monitoring V2 dashboards · Issue #38934 · rancher/rancher · GitHubGitHub - fe-ax/cadvisor-k8s-fix: When using Rancher monitoring with Kubernetes 1.24 cAdvisor doesn't work properly due to the dockershim removal
整理了一下解决方法:
1,
删除掉原有的kubelet 的 servicemonitor文件,或者在集群中删掉这个kubelet的servicemonitor,
2,
添加prometheus rule资源:
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
app.kubernetes.io/name: kube-prometheus
app.kubernetes.io/part-of: kube-prometheus
prometheus: k8s
name: kubernetes-additional-rules
namespace: monitoring
spec:
groups:
- name: kube_pod_container_resource_usage
interval: 30s # 规则评估间隔
rules:
- record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
expr: sum by (namespace, pod, container) (irate(container_cpu_usage_seconds_total{job="cadvisor"}[5m]))
- name: k8s.rules
rules:
- record: node_namespace_pod_container:container_cpu_usage_seconds_total:sum_irate
expr: |
sum by (cluster, namespace, pod, container) (
irate(container_cpu_usage_seconds_total{job="cadvisor", metrics_path="/metrics", image!=""}[5m])
)
- record: node_namespace_pod_container:container_memory_working_set_bytes
expr: |
container_memory_working_set_bytes{job="cadvisor", metrics_path="/metrics", image!=""}
- record: node_namespace_pod_container:container_memory_rss
expr: |
container_memory_rss{job="cadvisor", metrics_path="/metrics", image!=""}
- record: node_namespace_pod_container:container_memory_cache
expr: |
container_memory_cache{job="cadvisor", metrics_path="/metrics", image!=""}
- record: node_namespace_pod_container:container_memory_swap
expr: |
container_memory_swap{job="cadvisor", metrics_path="/metrics", image!=""}
- record: cluster:namespace:pod_memory:active:kube_pod_container_resource_requests
expr: |
kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"}
* on (namespace, pod, cluster) group_left() max by (namespace, pod) (
kube_pod_status_phase{phase=~"Pending|Running"} == 1
)
- record: namespace_memory:kube_pod_container_resource_requests:sum
expr: |
sum by (namespace, cluster) (
kube_pod_container_resource_requests{resource="memory",job="kube-state-metrics"}
* on(namespace, pod, cluster) group_left() max by (namespace, pod) (
kube_pod_status_phase{phase=~"Pending|Running"} == 1
)
)
labels:
workload_type: deployment # Adjust this label as needed per workload type
3,
在集群中部署独立的cadvisor
apiVersion: v1
kind: Namespace
metadata:
name: cadvisor
labels:
# 设置 Pod 安全级别为 'privileged',因为 cadvisor 需要较高权限
pod-security.kubernetes.io/enforce: privileged
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
app: cadvisor
name: cadvisor
namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
labels:
app: cadvisor
name: cadvisor
namespace: monitoring
spec:
clusterIP: None
ports:
- name: http
port: 8080
protocol: TCP
targetPort: http
selector:
app: cadvisor
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
annotations:
seccomp.security.alpha.kubernetes.io/pod: docker/default
labels:
app: cadvisor
name: cadvisor
namespace: monitoring
spec:
selector:
matchLabels:
app: cadvisor
name: cadvisor
template:
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
labels:
app: cadvisor
name: cadvisor
spec:
automountServiceAccountToken: false
containers:
- args:
- --housekeeping_interval=10s
- --max_housekeeping_interval=15s
- --event_storage_event_limit=default=0
- --event_storage_age_limit=default=0
- --enable_metrics=app,cpu,disk,diskIO,memory,network,process
- --docker_only
- --store_container_labels=false
- --whitelisted_container_labels=io.kubernetes.container.name,io.kubernetes.pod.name,io.kubernetes.pod.namespace
image: zcube/cadvisor:v0.45.0
name: cadvisor
ports:
- containerPort: 8080
name: http
protocol: TCP
resources:
limits:
cpu: 1
memory: 512Mi
requests:
cpu: 100m
memory: 256Mi
securityContext:
privileged: true
volumeMounts:
- mountPath: /dev
name: dev
- mountPath: /rootfs
name: rootfs
readOnly: true
- mountPath: /var/run
name: var-run
readOnly: true
- mountPath: /sys
name: sys
readOnly: true
- mountPath: /var/lib/docker
name: docker
readOnly: true
- mountPath: /dev/disk
name: disk
readOnly: true
- mountPath: /run/containerd
name: containerd
readOnly: true
- mountPath: /var/lib/containerd
name: containerd-var
readOnly: true
priorityClassName: system-node-critical
serviceAccountName: cadvisor
terminationGracePeriodSeconds: 30
tolerations:
- key: CriticalAddonsOnly
operator: Exists
- effect: NoSchedule
operator: Exists
- effect: NoExecute
operator: Exists
volumes:
- hostPath:
path: /dev
name: dev
- hostPath:
path: /
name: rootfs
- hostPath:
path: /var/run
name: var-run
- hostPath:
path: /sys
name: sys
- hostPath:
path: /var/lib/docker
name: docker
- hostPath:
path: /dev/disk
name: disk
- hostPath:
path: /var/lib/containerd
type: ""
name: containerd-var
- hostPath:
path: /run/containerd
type: ""
name: containerd
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: cadvisor
name: cadvisor
namespace: monitoring
spec:
endpoints:
- honorLabels: true
metricRelabelings:
- sourceLabels:
- container_label_io_kubernetes_pod_name
targetLabel: pod
- sourceLabels:
- container_label_io_kubernetes_pod_namespace
targetLabel: namespace
- sourceLabels:
- container_label_io_kubernetes_container_name
targetLabel: container
- replacement: ""
targetLabel: cluster
path: /metrics
port: http
relabelings:
- sourceLabels:
- __metrics_path__
targetLabel: metrics_path
namespaceSelector:
matchNames:
- monitoring
selector:
matchLabels:
app: cadvisor
然后等待prometheus加载配置后查看面板
done!
更多推荐
已为社区贡献1条内容
所有评论(0)