背景

实际环境中很多企业是将 Prometheus 单独部署在集群外部的,甚至直接监控多个 Kubernetes 集群,虽然不推荐这样去做,因为 Prometheus 采集的数据量太大,或大量消耗资源,比较推荐的做法有:

  1. 用不同的 Prometheus 实例监控不同的集群,然后用一种工具(比如grafana,prometheus作为数据源)进行汇总;
  2. 搭建一个资源很大的中心prometheus,然后在各个集群各建立一个实例,然后让各个实例的数据推送到中心prometheus上。

但是使用 Prometheus 监控外部的 Kubernetes 集群这个需求还是非常有必要的。

搭建步骤

创建用户授权

下面这个文件是需要再 被prometheus监控的k8s集群 执行

$ cat prometheus-rbac.yaml


apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus
  namespace: tools
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - "extensions"
  resources:
    - ingresses
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus
subjects:
- kind: ServiceAccount
  name: prometheus
  namespace: tools

这里有个踩坑点,再创建sa(ServiceAccount)文件的时候,使用下面命令查看的时候,没有token

$ kubectl describe secret prometheus -n tools

这是因为:k8s只有在1.23及之前版本的集群中,ServiceAccount才会自动创建Secret。之后的版本需要自己创建secret,然后在跟sa绑定上

apiVersion: v1
kind: Secret
type: kubernetes.io/service-account-token
metadata:
  name: prometheus
  namespace: tools
  annotations:
    kubernetes.io/service-account.name: "prometheus"

将上述文件进行执行:

[root@iZ2ze1ut8g7ndn5d2soajcZ ~]# kubectl apply -f prometheus-rbac.yaml
serviceaccount/prometheus created
clusterrole.rbac.authorization.k8s.io/prometheus created
clusterrolebinding.rbac.authorization.k8s.io/prometheus created

相关删除命令:

kubectl delete clusterrolebinding prometheus
kubectl delete serviceaccount prometheus -n tools
kubectl delete clusterrole prometheus

获取token

$ kubectl describe secrets  prometheus -n tools

Name:         prometheus-token-whlv8
Namespace:    tools
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: prometheus
              kubernetes.io/service-account.uid: dcf1bf81-4636-4511-9332-293a320c3d60

Type:  kubernetes.io/service-account-token

Data
====
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6IjdSMGhJMnhFaThNSjg3SFFsRGJ1bUljR0lMbWZCR0lGWUw3SjN3WVhPT1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJ0b29scyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLXdobHY4Iiwia3ViZXJuZXRlcy5pby9zZXJ291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkY2YxYmY4MS00Mi0yOTNhMzIwYzNkNjAiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6dG9vbHM6cHJvbWV0aGV1cyJ9.zFLLIddFuk5CfjEyFWCcNguzzmutllhtYtfuybuQdx47lQ1R_iUdMUifhySICMVJ_XcPBx1wSNVRzbikQ3DRVp4RfwxJH1vWpvX0msHa_aDzQrniEwOcg9zMNTzczJq3L8d8VengSb1_Lpri4Qnk23XlfFj2f3zgmG91nzgW276nCF4cWZfIRlHYoHgkWipqJak_GdII7dIpBpEIdy9F98uKeDwQ-meMZnBF-_KqAiQkKnsswITJV-Wn3Aofbxygqh6q1dCKJ1SrU7DMqpSKmgPFiuPSb4qxg
ca.crt:     1180 bytes
namespace:  5 bytes

创建k8s.token

prometheus容器重创建k8s.token文件

[root@monitoring prometheus]# pwd
/opt/prometheus
[root@monitoring prometheus]# vim k8s.token 
[root@monitoring prometheus]# cat k8s.token 
eyJhbGciOiJSUzI1NiIsImtpZCI6IjdSMGhJMnhFaThNSjg3SFFsRGJ1bUljR0lMbWZCR0lGWUw3SjN3WVhPT1UifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJ0b29scyIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJwcm9tZXRoZXVzLXRva2VuLXdobHY4Iiwia3ViZXJuZXRlcy5pby9zZXJ291bnQubmFtZSI6InByb21ldGhldXMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC51aWQiOiJkY2YxYmY4MS00Mi0yOTNhMzIwYzNkNjAiLCJzdWIiOiJzeXN0ZW06c2VydmljZWFjY291bnQ6dG9vbHM6cHJvbWV0aGV1cyJ9.zFLLIddFuk5CfjEyFWCcNguzzmutllhtYtfuybuQdx47lQ1R_iUdMUifhySICMVJ_XcPBx1wSNVRzbikQ3DRVp4RfwxJH1vWpvX0msHa_aDzQrniEwOcg9zMNTzczJq3L8d8VengSb1_Lpri4Qnk23XlfFj2f3zgmG91nzgW276nCF4cWZfIRlHYoHgkWipqJak_GdII7dIpBpEIdy9F98uKeDwQ-meMZnBF-_KqAiQkKnsswITJV-Wn3Aofbxygqh6q1dCKJ1SrU7DMqpSKmgPFiuPSb4qxg
[root@monitoring prometheus]# 

编写prometheus-server.yml

global:
  evaluation_interval: 1m
  scrape_interval: 1m
  scrape_timeout: 10s

scrape_configs:
- job_name: prometheus
  static_configs:
  - targets:
    - localhost:9090
- job_name: "metrics-data"
  scrape_interval: 15s
  scrape_timeout: 15s
  metrics_path: '/metrics'
  static_configs:
  file_sd_configs:
  - files:
     - prometheus-metrics.yml



#API Serevr节点发现
- job_name: "alik3-apiservers-monitor"
  kubernetes_sd_configs:
  - role: endpoints
    api_server: https://xx.xx.7.xx:6443
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/k8s.token
  scheme: https
  tls_config:
    insecure_skip_verify: true
  bearer_token_file: /opt/prometheus/k8s.token
  relabel_configs:
  - source_labels: [__meta_kubernetes_namespace,__meta_kubernetes_service_name,__meta_kubernetes_endpoint_port_name]
    action: keep
    regex: default;kubernetes;https

#node节点发现
- job_name: "alik3-nodes-monitor"
  scheme: https
  tls_config:
     insecure_skip_verify: true
  bearer_token_file: /opt/prometheus/k8s.token
  kubernetes_sd_configs:
  - role: node
    api_server: https://xx.xxx.xxx:xx
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/k8s.token
  relabel_configs:
  - source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]
    regex: "(.*)"
    replacement: "${1}"
    action: replace
    target_label: LOC
  - source_labels: [__meta_kubernetes_node_label_failure_domain_beta_kubernetes_io_region]
    regex: "(.*)"
    replacement: "NODE"
    action: replace
    target_label: Type
  - action: labelmap
    regex: __meta_kubernetes_node_label_(.+)

#指定namespace 的pod
- job_name: "alik3-发现指定namespace的所有pod"
  kubernetes_sd_configs:
  - role: pod
    api_server: https://xx.xx.7.xx:xxx
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/k8s.token
    namespaces:
      names:
      - kube-system
      - business
  relabel_configs:
  - action: keep
    regex: true
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scrape
  - action: drop
    regex: true
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scrape_slow
  - action: replace
    regex: (https?)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_scheme
    target_label: __scheme__
  - action: replace
    regex: (.+)
    source_labels:
    - __meta_kubernetes_pod_annotation_prometheus_io_path
    target_label: __metrics_path__
  - action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    source_labels:
    - __address__
    - __meta_kubernetes_pod_annotation_prometheus_io_port
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_annotation_prometheus_io_param_(.+)
    replacement: __param_$1
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - action: replace
    source_labels:
    - __meta_kubernetes_namespace
    target_label: namespace
  - action: replace
    source_labels:
    - __meta_kubernetes_pod_name
    target_label: pod
  - action: drop
    regex: Pending|Succeeded|Failed|Completed
    source_labels:
    - __meta_kubernetes_pod_phase

#指定Pod发现条件
- job_name: "alik3-指定发现条件的pod"
  kubernetes_sd_configs:
  - role: pod
    api_server: https://xx.xx.7.xx:xxx
    tls_config:
      insecure_skip_verify: true
    bearer_token_file: /opt/prometheus/k8s.token
  relabel_configs:
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
    action: replace
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
    target_label: __address__
  - action: labelmap
    regex: __meta_kubernetes_pod_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_pod_name]
    action: replace
    target_label: kubernetes_pod_name

配置参考以及详细信息:prometheus理论+实践

Prometheus配置详解

重启prometheus服务

如下:

在这里插入图片描述

踩坑点

问题一 : context deadline exceeded

Get "https://192.xx.xx.xx:5444/metrics": context deadline exceeded

解决办法:有可能端口未开放,指定其他端口

还有一种可能是,集群的网段不同,api-server的地址能获取到alik3-apiservers-monitoralik3-nodes-monitoralik3-发现指定namespace的所有podalik3-指定发现条件的pod 这些信息,但是里面的详细内容都是通过内网访问的,如果prometheus与监控的k8s集群网络不通的话,那确实回报这个错

参考文献

基于外部prometheus监控k8s 集群及k8s应用服务

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐