Prometheus 基于k8s服务发现kube-state-metrics监控资源
在k8s当中有很多的资源,比如Pod,Service,要监控这些要部署kube-state-metrics[root@k8s-master ~]# kubectl get pod,svcNAMEREADYSTATUSRESTARTSAGEpod/nginx-6799fc88d8-drb2s1/1Running374dNAMETYPEC
kube-state-metrics是什么?
kube-state-metrics通过监听API Server生成有关资源对象的状态指标,比如Deployment、Node、Pod,需要注意的是kube-state-metrics只是简单的提供一个metrics数据,并不会存储这些指标数据,所以我们可以使用Prometheus来抓取这些数据然后存储,主要关注的是业务相关的一些元数据,比如Deployment、Pod、副本状态等;调度了多少个replicas?现在可用的有几个?多少个Pod是running/stopped/terminated状态?Pod重启了多少次?我有多少job在运行中。
在k8s当中有很多的资源,比如Pod,Service,要监控这些要部署kube-state-metrics
[root@k8s-master ~]# kubectl get pod,svc
NAME READY STATUS RESTARTS AGE
pod/nginx-6799fc88d8-drb2s 1/1 Running 3 74d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 74d
service/nginx NodePort 10.99.50.2 <none> 80:31332/TCP 74d
安装kube-state-metrics组件
1、部署kube-state-metrics
2、Grafana导入仪表盘
这个会采集k8s里面资源的状态,并且以普罗米修斯数据模型暴露出来,可以直接访问这个pod的IP
kube-state-metrics-rbac.yaml
[root@master prometheus]# cat kube-state-metrics-rbac.yaml
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
verbs: ["list", "watch"]
- apiGroups: ["extensions"]
resources: ["daemonsets", "deployments", "replicasets"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources: ["cronjobs", "jobs"]
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
这里会创建一个serviceaccount,还有一个clusterrole被创建了,这个sa会通过clusterrolebinding绑定clusterrole。
serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
[root@master prometheus]# cat kube-state-metrics-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kube-state-metrics
template:
metadata:
labels:
app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: quay.io/coreos/kube-state-metrics:v1.9.0
ports:
- containerPort: 8080
在service当中有这个参数prometheus.io/scrape: 'true'那么会被服务发现到
[root@master prometheus]# cat kube-state-metrics-svc.yaml
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
name: kube-state-metrics
namespace: kube-system
labels:
app: kube-state-metrics
spec:
ports:
- name: kube-state-metrics
port: 8080
protocol: TCP
selector:
app: kube-state-metrics
[root@k8s-master ~]# kubectl get pod -n kube-system -o wide
kube-state-metrics-5d8b77488c-l5sbr 2/2 Running 0 2m19s 10.244.169.139 k8s-node2 <none> <none>
[root@master prometheus]# kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 3d21h
kube-state-metrics ClusterIP 10.233.59.237 <none> 8080/TCP 8s
[root@master prometheus]# kubectl get svc coredns -n kube-system -o yaml
apiVersion: v1
kind: Service
metadata:
annotations:
kubectl.kubernetes.io/last-applied-configuration: |
{"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/port":"9153","prometheus.io/scrape":"true"},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","kubernetes.io/name":"coredns"},"name":"coredns","namespace":"kube-system"},"spec":{"clusterIP":"10.233.0.3","ports":[{"name":"dns","port":53,"protocol":"UDP"},{"name":"dns-tcp","port":53,"protocol":"TCP"},{"name":"metrics","port":9153,"protocol":"TCP"}],"selector":{"k8s-app":"kube-dns"}}}
prometheus.io/port: "9153"
prometheus.io/scrape: "true"
这个pod会采集k8s当中资源信息,并且以普罗米修斯数据模型暴露出来
[root@k8s-master ~]# curl 10.244.169.139:8080/metrics | head -n 20
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0# HELP kube_certificatesigningrequest_labels Kubernetes labels converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_labels gauge
# HELP kube_certificatesigningrequest_created Unix creation timestamp
# TYPE kube_certificatesigningrequest_created gauge
# HELP kube_certificatesigningrequest_condition The number of each certificatesigningrequest condition
# TYPE kube_certificatesigningrequest_condition gauge
# HELP kube_certificatesigningrequest_cert_length Length of the issued cert
# TYPE kube_certificatesigningrequest_cert_length gauge
# HELP kube_configmap_info Information about configmap.
# TYPE kube_configmap_info gauge
kube_configmap_info{namespace="kube-system",configmap="kubeadm-config"} 1
kube_configmap_info{namespace="kube-system",configmap="kubelet-config-1.19"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-state-metrics-config"} 1
可以看到数据被普罗米修斯给采集到就可以被正常的监控到
现在配置普罗米修斯
# my global config
global:
scrape_interval: 15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
# scrape_timeout is set to the global default (10s).
# Alertmanager configuration
alerting:
alertmanagers:
- static_configs:
- targets:
# - alertmanager:9093
# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
# The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
- job_name: 'prometheus'
# metrics_path defaults to '/metrics'
# scheme defaults to 'http'.
static_configs:
- targets: ['localhost:9090']
- job_name: kubernetes-nodes-cadvisor
metrics_path: /metrics
scheme: https
kubernetes_sd_configs:
- role: node
api_server: https://192.168.179.102:6443
bearer_token_file: /usr/local/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
bearer_token_file: /usr/local/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
relabel_configs:
# 将标签(.*)作为新标签名,原有值不变
- action: labelmap
regex: __meta_kubernetes_node_label_(.*)
# 修改NodeIP:10250为APIServerIP:6443
- action: replace
regex: (.*)
source_labels: ["__address__"]
target_label: __address__
replacement: 192.168.31.61:6443
# 实际访问指标接口 https://NodeIP:10250/metrics/cadvisor 这个接口只能APISERVER访问,故此重新标记标签使用APISERVER代理访问
- action: replace
source_labels: [__meta_kubernetes_node_name]
target_label: __metrics_path__
regex: (.*)
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
3上面是配置cadvisor的,下面这里是配置监控k8s资源状态的
- job_name: kubernetes-service-endpoints
kubernetes_sd_configs:
- role: endpoints
api_server: https://192.168.179.102:6443
bearer_token_file: /usr/local/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
bearer_token_file: /usr/local/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
# Service没配置注解prometheus.io/scrape的不采集
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scrape
# 重命名采集目标协议
- action: replace
regex: (https?)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_scheme
target_label: __scheme__
# 重命名采集目标指标URL路径
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_service_annotation_prometheus_io_path
target_label: __metrics_path__
# 重命名采集目标地址
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_service_annotation_prometheus_io_port
target_label: __address__
# 将K8s标签(.*)作为新标签名,原有值不变
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
# 生成命名空间标签
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
# 生成Service名称标签
- action: replace
source_labels:
- __meta_kubernetes_service_name
target_label: kubernetes_service_name
- job_name: kubernetes-pods
kubernetes_sd_configs:
- role: pod
api_server: https://192.168.179.102:6443
bearer_token_file: /usr/local/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
bearer_token_file: /usr/local/prometheus/token.k8s
tls_config:
insecure_skip_verify: true
# 重命名采集目标协议
relabel_configs:
- action: keep
regex: true
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_scrape
# 重命名采集目标指标URL路径
- action: replace
regex: (.+)
source_labels:
- __meta_kubernetes_pod_annotation_prometheus_io_path
target_label: __metrics_path__
# 重命名采集目标地址
- action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
source_labels:
- __address__
- __meta_kubernetes_pod_annotation_prometheus_io_port
target_label: __address__
# 将K8s标签(.*)作为新标签名,原有值不变
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
# 生成命名空间标签
- action: replace
source_labels:
- __meta_kubernetes_namespace
target_label: kubernetes_namespace
# 生成Service名称标签
- action: replace
source_labels:
- __meta_kubernetes_pod_name
target_label: kubernetes_pod_name
服务发现类型为endpoints,从endpoints获取pod纳入监控
kubernetes_sd_configs:
- role: endpoints
在k8上这几个地址都可以在k8s节点上访问
[root@k8s-master ~]# curl http://10.244.36.73:9153/metrics
# HELP coredns_build_info A metric with a constant '1' value labeled by version, revision, and goversion from which CoreDNS was built.
# TYPE coredns_build_info gauge
coredns_build_info{goversion="go1.14.4",revision="f59c03d",version="1.7.0"} 1
# HELP coredns_cache_entries The number of elements in the cache.
# TYPE coredns_cache_entries gauge
coredns_cache_entries{server="dns://:53",type="denial"} 1
coredns_cache_entries{server="dns://:53",type="success"} 0
这里普罗米修斯的网络要和K8s的节点在同一个网络,因为这个ip是pod的ip,我这里由于普罗米修斯和k8s pod不在一个网络,所以显示down,如果普罗米修斯部署在k8s就不存在这个问题
如果在K8s集群当中部署普罗米修斯就不存在这个问题,因为网络互通,这个需要配置相关的路由规则实现的
采集思路就是上面这些,最后用grafana去展示你的数据即可
更多推荐
所有评论(0)