Prometheus的relabel_configs:
promethues的relabeling(重新修改标签)功能很强大,它能够在抓取到目标实例之前把目标实例的元数据标签动态重新修改,动态添加或者覆盖标签
prometheus加载target成功之后,在Target实例中,都包含一些Metadata标签信息,默认的标签有:
__address__:以<host>:<port> 格式显示目标targets的地址
__scheme__:采集的目标服务地址的Scheme形式,HTTP或者HTTPS
__metrics_path__:采集的目标服务的访问路径

Prometheus怎么从Target实例中获取监控数据就是通过上面这些标签获取得,上面列举的是一些默认标签,我们还可以为Target添加一些自定义的标签,我们可以在配置文件中设置relabeling重写规则

relabel_configs配置详细说明:
action:action定义了relabel的动作,action支持多种,如下

replace: 替换标签值,根据regex正则匹配到源标签的值,并把匹配的值写入到目的标签中
keep: 满足regex正则条件的实例进行采集,把source_labels中没有匹配到regex正则内容的Target实例丢掉
drop: 满足regex正则条件的实例不采集,把source_labels中匹配到regex正则内容的Target实例丢掉
labeldrop:  对抓取到的符合过滤规则的target标签进行删除
labelkeep:  对抓取到的符合过滤规则的target标签进行保留


source_labels:源标签,没有经过relabel处理之前的标签名字
target_label:通过action处理之后的新的标签名字
regex:正则表达式,匹配源标签
replacement:replacement指定的替换后的标签(target_label)对应的数值

1.prometheus配置endpoints模式的服务发现-监控apiserver组件

apiserver作为Kubernetes最核心的组件,它的监控也是非常有必要的,对于apiserver的监控,我们可以直接通过kubernetes的service来获取
kubectl get svc
NAME         TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)   AGE
kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP   2d5h

上面的service是我们集群的apiserver内部的service的地址,要自动发现service类型的服务,需要使用role为Endpoints的kubernetes_sd_configs (自动发现),我们只需要在configmap里面在添加Endpoints类型的服务发现

   -  job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https


#参数解释
action: keep #保留哪些标签 
regex: default;kubernetes;https #匹配namespace下的default命名空间下的kubernetes service 

https协议 可以通过kubectl describe svc kubernetes查看到

注:修改prometheus-cfg.yaml文件生效步骤
kubectl apply -f prometheus-cfg.yaml
kubectl delete -f prometheus-deploy.yaml 
kubectl apply -f prometheus-deploy.yaml

2.prometheus监控apiserver的各监指标介绍

官网:https://github.com/signalfx/integrations/blob/master/signalfx-agent/agent_docs/monitors/kubernetes-apiserver.md

Apiserver组件是k8s集群的入口,所有请求都是从apiserver进来的,所以对apiserver指标做监控可以用来判断集群的健康状况
参考:https://www.jianshu.com/p/02917b280ebe
1、apiserver_request_count
apiserver_request_count 表示apiserver请求次数
(1)Kubernetes中的各种资源指标在1min钟之内的请求速率sum(rate(apiserver_request_count[1m])) by (resource, subresource, verb)
请求的动作有WATCH,LIST,POST,PATCH,PUT,GET,DELETE和CONNECT
(2)1分钟的错误率和请求率的比率
rate(apiserver_request_count{code=~"^(?:5..)$"}[1m]) / rate(apiserver_request_count[1m])

2、apiserver_request_total表示请求次数:对服务的请求、来自哪里、到哪个服务、哪个操作以及是否成功:
(1)获得所有成功的请求,状态码是200的请求总数:
sum(rate(apiserver_request_total{job="kubernetes-apiserver",code=~"2.."}[5m]))
(2)获取状态码是400或者500的请求总数
sum(rate(apiserver_request_total{job="kubernetes-apiserver",code=~"[45].."}[5m]))

apiserver_request_duration_seconds 表示请求耗时时间



一、apiserver_admission_controller组
以下所有指标都是apiserver_admission_controller 组的一部分
apiserver_admission_controller_admission_duration_seconds (cumulative)
Admission controller latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). (sum)
apiserver_admission_controller_admission_duration_seconds_bucket (cumulative)
Admission controller latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). (bucket)
apiserver_admission_controller_admission_duration_seconds_count (cumulative)
Admission controller latency histogram in seconds, identified by name and broken out for each operation and API resource and type (validate or admit). (count)
apiserver_admission_controller_admission_latencies_milliseconds (cumulative)
(Deprecated) Admission controller latency histogram in milliseconds, identified by name and broken out for each operation and API resource and type (validate or admit). (sum)
apiserver_admission_controller_admission_latencies_milliseconds_bucket (cumulative)
(Deprecated) Admission controller latency histogram in milliseconds, identified by name and broken out for each operation and API resource and type (validate or admit). (bucket)
apiserver_admission_controller_admission_latencies_milliseconds_count (cumulative)
(Deprecated) Admission controller latency histogram in milliseconds, identified by name and broken out for each operation and API resource and type (validate or admit). (count)

二、 apiserver_admission_step_admission组
以下指标都是apiserver_admission_step_admission组的一部分


apiserver_admission_step_admission_duration_seconds (cumulative)
Admission sub-step latency histogram in seconds, broken out for each operation and API resource and step type (validate or admit). (sum)
apiserver_admission_step_admission_duration_seconds_bucket (cumulative)
Admission sub-step latency histogram in seconds, broken out for each operation and API resource and step type (validate or admit). (bucket)
apiserver_admission_step_admission_duration_seconds_count (cumulative)
Admission sub-step latency histogram in seconds, broken out for each operation and API resource and step type (validate or admit). (count)
apiserver_admission_step_admission_duration_seconds_summary (cumulative)
Admission sub-step latency summary in seconds, broken out for each operation and API resource and step type (validate or admit). (sum)
apiserver_admission_step_admission_duration_seconds_summary_count (cumulative)
Admission sub-step latency summary in seconds, broken out for each operation and API resource and step type (validate or admit). (count)
apiserver_admission_step_admission_duration_seconds_summary_quantile (gauge)
Admission sub-step latency summary in seconds, broken out for each operation and API resource and step type (validate or admit). (quantized)
apiserver_admission_step_admission_latencies_milliseconds (cumulative)
(Deprecated) Admission sub-step latency histogram in milliseconds, broken out for each operation and API resource and step type (validate or admit). (sum)
apiserver_admission_step_admission_latencies_milliseconds_bucket (cumulative)
(Deprecated) Admission sub-step latency histogram in milliseconds, broken out for each operation and API resource and step type (validate or admit). (bucket)
apiserver_admission_step_admission_latencies_milliseconds_count (cumulative)
(Deprecated) Admission sub-step latency histogram in milliseconds, broken out for each operation and API resource and step type (validate or admit). (count)
apiserver_admission_step_admission_latencies_milliseconds_summary (cumulative)
(Deprecated) Admission sub-step latency summary in milliseconds, broken out for each operation and API resource and step type (validate or admit). (sum)
apiserver_admission_step_admission_latencies_milliseconds_summary_count (cumulative)
(Deprecated) Admission sub-step latency summary in milliseconds, broken out for each operation and API resource and step type (validate or admit). (count)
apiserver_admission_step_admission_latencies_milliseconds_summary_quantile (gauge)
(Deprecated) Admission sub-step latency summary in milliseconds, broken out for each operation and API resource and step type (validate or admit). (quantized)
三、apiserver_audit metric 组
以下指标都是apiserver_audit metric组的一部分
apiserver_audit_event_total (cumulative)
Counter of audit events generated and sent to the audit backend.
apiserver_audit_requests_rejected_total (cumulative)
Counter of apiserver requests rejected due to an error in audit logging backend.
四、apiserver_client组
以下指标都是apiserver_client组的一部分
apiserver_client_certificate_expiration_seconds (cumulative)
Distribution of the remaining lifetime on the certificate used to authenticate a request. (sum)
apiserver_client_certificate_expiration_seconds_bucket (cumulative)
Distribution of the remaining lifetime on the certificate used to authenticate a request. (bucket)
apiserver_client_certificate_expiration_seconds_count (cumulative)
Distribution of the remaining lifetime on the certificate used to authenticate a request. (count)
五、apiserver_request组
以下指标都是apiserver_request组的一部分
apiserver_request_count (cumulative)
(Deprecated) Counter of apiserver requests broken out for each verb, group, version, resource, scope, component, client, and HTTP response contentType and code.
apiserver_request_duration_seconds (cumulative)
Response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope and component. (sum)
apiserver_request_duration_seconds_bucket (cumulative)
Response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope and component. (bucket)
apiserver_request_duration_seconds_count (cumulative)
Response latency distribution in seconds for each verb, dry run value, group, version, resource, subresource, scope and component. (count)
apiserver_request_latencies (cumulative)
(Deprecated) Response latency distribution in microseconds for each verb, group, version, resource, subresource, scope and component. (sum)
apiserver_request_latencies_bucket (cumulative)
(Deprecated) Response latency distribution in microseconds for each verb, group, version, resource, subresource, scope and component. (bucket)
apiserver_request_latencies_count (cumulative)
(Deprecated) Response latency distribution in microseconds for each verb, group, version, resource, subresource, scope and component. (count)
apiserver_request_latencies_summary (cumulative)
(Deprecated) Response latency summary in microseconds for each verb, group, version, resource, subresource, scope and component. (sum)
apiserver_request_latencies_summary_count (cumulative)
(Deprecated) Response latency summary in microseconds for each verb, group, version, resource, subresource, scope and component. (count)
apiserver_request_latencies_summary_quantile (gauge)
(Deprecated) Response latency summary in microseconds for each verb, group, version, resource, subresource, scope and component. (quantized)
apiserver_request_total (cumulative)
Counter of apiserver requests broken out for each verb, dry run value, group, version, resource, scope, component, client, and HTTP response contentType and code.
五、apiserver_ response组
以下指标都是apiserver_response组的一部分
apiserver_response_sizes (cumulative)
Response size distribution in bytes for each group, version, verb, resource, subresource, scope and component. (sum)
apiserver_response_sizes_bucket (cumulative)
Response size distribution in bytes for each group, version, verb, resource, subresource, scope and component. (bucket)
apiserver_response_sizes_count (cumulative)
Response size distribution in bytes for each group, version, verb, resource, subresource, scope and component. (count)
apiserver_storage组
以下指标都是apiserver_storage组的一部分
apiserver_storage_data_key_generation_duration_seconds (cumulative)
Latencies in seconds of data encryption key(DEK) generation operations. (sum)
apiserver_storage_data_key_generation_duration_seconds_bucket (cumulative)
Latencies in seconds of data encryption key(DEK) generation operations. (bucket)
apiserver_storage_data_key_generation_duration_seconds_count (cumulative)
Latencies in seconds of data encryption key(DEK) generation operations. (count)
apiserver_storage_data_key_generation_failures_total (cumulative)
Total number of failed data encryption key(DEK) generation operations.
apiserver_storage_data_key_generation_latencies_microseconds (cumulative)
(Deprecated) Latencies in microseconds of data encryption key(DEK) generation operations. (sum)
apiserver_storage_data_key_generation_latencies_microseconds_bucket (cumulative)
(Deprecated) Latencies in microseconds of data encryption key(DEK) generation operations. (bucket)
apiserver_storage_data_key_generation_latencies_microseconds_count (cumulative)
(Deprecated) Latencies in microseconds of data encryption key(DEK) generation operations. (count)
apiserver_storage_envelope_transformation_cache_misses_total (cumulative)
Total number of cache misses while accessing key decryption key(KEK).

3.prometheus配置endpoints模式的服务发现-监控service服务

- job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name


apiserver实际上是一种特殊的Service,现在配置一个专门发现普通类型的Service
这里我们对service进行过滤,只有在service配置了prometheus.io/scrape: "true"过滤出来

#1.参数解释 
relabel_configs: 
- source_labels:[__meta_kubernetes_service_annotation_prometheus_io_scrape] action: keep 
regex: true 保留标签 

这行配置代表我们只去筛选有__meta_kubernetes_service_annotation_prometheus_io_scrape的service,也就是说service里需要有prometheus.io.scrape/true,只有添加了这个声明才可以被prometheus自动发现

#2.参数解释
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port] 
action: replace 
target_label: __address__ 
regex: ([^:]+)(?::\d+)?;(\d+) 
replacement: $1:$2
#指定一个抓取的端口,有的ser vice可能有多个端口(比如redis)。默认使用的是我们添加时使用kubernetes_service端口



#3.参数解释 
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme] action: replace 
target_label: __scheme__ 
regex: (http?) 
#这里如果是https证书类型,我们还需要在添加证书和token

 

 

 

Logo

开源、云原生的融合云平台

更多推荐