项目场景

在跟博哥的k8s课程
第28关 k8s监控实战之Prometheus(五)
使用prometheus监控ingress-nginx服务


资源配置

我的ingress-controller是部署在一个单独的命名空间test-ingress-controller里,
创建的ServiceMonitor配置如下

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: ingress-nginx
  name: nginx-ingress-scraping
  namespace: test-ingress-controller
spec:
  endpoints:
  - interval: 30s
    path: /metrics
    port: metrics
  jobLabel: app
  namespaceSelector:
    matchNames:
    - test-ingress-controller
  selector:
    matchLabels:
      app: ingress-nginx

检查是否能正确选择service

# kubectl -n test-ingress-controller get service -l app=ingress-nginx
NAME               TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                    AGE
nginx-ingress-lb   ClusterIP   10.68.219.193   <none>        80/TCP,443/TCP,10254/TCP   12d

问题描述

在Prometheus 的 targets 中看不到新创建的 ServiceMonitor
serviceMonitor/test-ingress-controller/nginx-ingress-scraping
在这里插入图片描述


原因分析

查看日志

# kubectl -n monitoring logs prometheus-k8s-0
ts=2024-06-28T16:42:40.014Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User "system:serviceaccount:monitoring:prometheus-k8s" cannot list resource "pods" in API group "" in the namespace "test-ingress-controller"`

ts=2024-06-28T16:43:10.772Z caller=klog.go:108 level=warn component=k8s_client_runtime func=Warningf msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: failed to list *v1.Endpoints: endpoints is forbidden: User "system:serviceaccount:monitoring:prometheus-k8s" cannot list resource "endpoints" in API group "" in the namespace "test-ingress-controller"`

提示prometheus-k8s 服务账户在尝试访问 test-ingress-controller 命名空间下的 podsendpoints 资源时遇到了权限问题。
默认只配置了get权限

在博哥的第22关有个类似的错误
第22关 深入解析K8s中的RBAC角色访问控制策略

# kubectl --kubeconfig ./kube_config/test-kubeconfig-a.kube.conf get all
Error from server (Forbidden): horizontalpodautoscalers.autoscaling is forbidden: User "system:serviceaccount:test-kubeconfig-a:test-kubeconfig-a-user" cannot list resource "horizontalpodautoscalers" in API group "autoscaling" in the namespace "test-kubeconfig-a"

提示hpa自动伸缩的pod看不了
服务账户test-kubeconfig-a-user当前未获授权以列举(list)归属于autoscaling API组下的horizontalpodautoscalers资源类型,在test-kubeconfig-a命名空间范围内


解决方案

方案一 Role

在命名空间test-ingress-controller里添加一个角色Role和角色绑定RoleBinding。把拥有足够权限的Role和命名空间monitor的ServiceAccount账号prometheus-k8s绑定

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: test-ingress-controller-prometheus-k8s
  namespace: test-ingress-controller # 指定ingress-controller的命名空间
rules:
- apiGroups: [""] # "" 表示核心 API 组
  resources: ["services", "endpoints", "pods"]
  verbs: ["get", "list", "watch"]
- apiGroups: ["extensions", "networking.k8s.io"]
  resources: ["ingresses"]
  verbs: ["get", "list", "watch"]


---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: test-ingress-controller-prometheus-k8s-binding
  namespace: test-ingress-controller # 与 Role 的命名空间相同
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: monitoring # prometheus服务账户所在命名空间
roleRef:
  kind: Role
  name: test-ingress-controller-prometheus-k8s
  apiGroup: rbac.authorization.k8s.io

重启prometheus pod

# kubectl -n monitoring delete pod prometheus-k8s-0 && kubectl -n monitoring delete pod prometheus-k8s-1

等一会儿就能看到
serviceMonitor/test-ingress-controller/nginx-ingress-scraping/0 (2/2 up)

方案二 ClusterRole

默认的

rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

修改

# kubectl -n monitoring edit clusterrole prometheus-k8s
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

这个方法没有尝试过

参考

Prometheus-Operator使用ServiceMonitor监控配置时遇坑与解决总结
排查 Kubernetes HPA 通过 Prometheus 获取不到 http_requests 指标的问题
为什么配置的ServiceMonitor或PodMonitor未生效?

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐