kube-state-metrics是什么?


kube-state-metrics通过监听API Server生成有关资源对象的状态指标,比如Deployment、Node、Pod,需要注意的是kube-state-metrics只是简单的提供一个metrics数据,并不会存储这些指标数据,所以我们可以使用Prometheus来抓取这些数据然后存储,主要关注的是业务相关的一些元数据,比如Deployment、Pod、副本状态等;调度了多少个replicas?现在可用的有几个?多少个Pod是running/stopped/terminated状态?Pod重启了多少次?我有多少job在运行中。

在k8s当中有很多的资源,比如Pod,Service,要监控这些要部署kube-state-metrics

[root@k8s-master ~]# kubectl get pod,svc
NAME                         READY   STATUS    RESTARTS   AGE
pod/nginx-6799fc88d8-drb2s   1/1     Running   3          74d

NAME                 TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)        AGE
service/kubernetes   ClusterIP   10.96.0.1    <none>        443/TCP        74d
service/nginx        NodePort    10.99.50.2   <none>        80:31332/TCP   74d

 

 

 安装kube-state-metrics组件


1、部署kube-state-metrics

2、Grafana导入仪表盘

这个会采集k8s里面资源的状态,并且以普罗米修斯数据模型暴露出来,可以直接访问这个pod的IP

kube-state-metrics-rbac.yaml  

[root@master prometheus]# cat kube-state-metrics-rbac.yaml 
---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: kube-state-metrics
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: kube-state-metrics
rules:
- apiGroups: [""]
  resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
  verbs: ["list", "watch"]
- apiGroups: ["extensions"]
  resources: ["daemonsets", "deployments", "replicasets"]
  verbs: ["list", "watch"]
- apiGroups: ["apps"]
  resources: ["statefulsets"]
  verbs: ["list", "watch"]
- apiGroups: ["batch"]
  resources: ["cronjobs", "jobs"]
  verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
  resources: ["horizontalpodautoscalers"]
  verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: kube-state-metrics
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: kube-state-metrics
subjects:
- kind: ServiceAccount
  name: kube-state-metrics
  namespace: kube-system

这里会创建一个serviceaccount,还有一个clusterrole被创建了,这个sa会通过clusterrolebinding绑定clusterrole。 

serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
[root@master prometheus]# cat kube-state-metrics-deploy.yaml 
apiVersion: apps/v1
kind: Deployment
metadata:
  name: kube-state-metrics
  namespace: kube-system
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-state-metrics
  template:
    metadata:
      labels:
        app: kube-state-metrics
    spec:
      serviceAccountName: kube-state-metrics
      containers:
      - name: kube-state-metrics
        image: quay.io/coreos/kube-state-metrics:v1.9.0
        ports:
        - containerPort: 8080

在service当中有这个参数prometheus.io/scrape: 'true'那么会被服务发现到 

[root@master prometheus]# cat kube-state-metrics-svc.yaml 
apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/scrape: 'true'
  name: kube-state-metrics
  namespace: kube-system
  labels:
    app: kube-state-metrics
spec:
  ports:
  - name: kube-state-metrics
    port: 8080
    protocol: TCP
  selector:
    app: kube-state-metrics
[root@k8s-master ~]# kubectl get pod -n kube-system -o wide
kube-state-metrics-5d8b77488c-l5sbr        2/2     Running   0          2m19s   10.244.169.139    k8s-node2    <none>           <none>
[root@master prometheus]# kubectl get svc -n kube-system
NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                  AGE
coredns              ClusterIP   10.233.0.3      <none>        53/UDP,53/TCP,9153/TCP   3d21h
kube-state-metrics   ClusterIP   10.233.59.237   <none>        8080/TCP                 8s
[root@master prometheus]# kubectl get svc coredns -n kube-system -o yaml
apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/port":"9153","prometheus.io/scrape":"true"},"labels":{"addonmanager.kubernetes.io/mode":"Reconcile","k8s-app":"kube-dns","kubernetes.io/cluster-service":"true","kubernetes.io/name":"coredns"},"name":"coredns","namespace":"kube-system"},"spec":{"clusterIP":"10.233.0.3","ports":[{"name":"dns","port":53,"protocol":"UDP"},{"name":"dns-tcp","port":53,"protocol":"TCP"},{"name":"metrics","port":9153,"protocol":"TCP"}],"selector":{"k8s-app":"kube-dns"}}}
    prometheus.io/port: "9153"
    prometheus.io/scrape: "true"

 这个pod会采集k8s当中资源信息,并且以普罗米修斯数据模型暴露出来

[root@k8s-master ~]# curl 10.244.169.139:8080/metrics  | head -n 20
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0# HELP kube_certificatesigningrequest_labels Kubernetes labels converted to Prometheus labels.
# TYPE kube_certificatesigningrequest_labels gauge
# HELP kube_certificatesigningrequest_created Unix creation timestamp
# TYPE kube_certificatesigningrequest_created gauge
# HELP kube_certificatesigningrequest_condition The number of each certificatesigningrequest condition
# TYPE kube_certificatesigningrequest_condition gauge
# HELP kube_certificatesigningrequest_cert_length Length of the issued cert
# TYPE kube_certificatesigningrequest_cert_length gauge
# HELP kube_configmap_info Information about configmap.
# TYPE kube_configmap_info gauge
kube_configmap_info{namespace="kube-system",configmap="kubeadm-config"} 1
kube_configmap_info{namespace="kube-system",configmap="kubelet-config-1.19"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-state-metrics-config"} 1

可以看到数据被普罗米修斯给采集到就可以被正常的监控到

现在配置普罗米修斯

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  # - "first_rules.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: kubernetes-nodes-cadvisor
    metrics_path: /metrics
    scheme: https
    kubernetes_sd_configs:
    - role: node
      api_server: https://192.168.179.102:6443
      bearer_token_file: /usr/local/prometheus/token.k8s 
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: /usr/local/prometheus/token.k8s 
    tls_config:
      insecure_skip_verify: true
    relabel_configs:
    # 将标签(.*)作为新标签名,原有值不变
    - action: labelmap
      regex: __meta_kubernetes_node_label_(.*)
    # 修改NodeIP:10250为APIServerIP:6443
    - action: replace
      regex: (.*)
      source_labels: ["__address__"]
      target_label: __address__
      replacement: 192.168.31.61:6443
    # 实际访问指标接口 https://NodeIP:10250/metrics/cadvisor 这个接口只能APISERVER访问,故此重新标记标签使用APISERVER代理访问
    - action: replace
      source_labels: [__meta_kubernetes_node_name]
      target_label: __metrics_path__
      regex: (.*)
      replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor 



3上面是配置cadvisor的,下面这里是配置监控k8s资源状态的


  - job_name: kubernetes-service-endpoints
    kubernetes_sd_configs:
    - role: endpoints
      api_server: https://192.168.179.102:6443
      bearer_token_file: /usr/local/prometheus/token.k8s
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: /usr/local/prometheus/token.k8s
    tls_config:
      insecure_skip_verify: true
    # Service没配置注解prometheus.io/scrape的不采集
    relabel_configs:
    - action: keep
      regex: true
      source_labels:
      - __meta_kubernetes_service_annotation_prometheus_io_scrape
    # 重命名采集目标协议
    - action: replace
      regex: (https?)
      source_labels:
      - __meta_kubernetes_service_annotation_prometheus_io_scheme
      target_label: __scheme__
    # 重命名采集目标指标URL路径
    - action: replace
      regex: (.+)
      source_labels:
      - __meta_kubernetes_service_annotation_prometheus_io_path
      target_label: __metrics_path__
    # 重命名采集目标地址
    - action: replace
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
      source_labels:
      - __address__
      - __meta_kubernetes_service_annotation_prometheus_io_port
      target_label: __address__
    # 将K8s标签(.*)作为新标签名,原有值不变
    - action: labelmap
      regex: __meta_kubernetes_service_label_(.+)
    # 生成命名空间标签
    - action: replace
      source_labels:
      - __meta_kubernetes_namespace
      target_label: kubernetes_namespace
    # 生成Service名称标签
    - action: replace
      source_labels:
      - __meta_kubernetes_service_name
      target_label: kubernetes_service_name

  - job_name: kubernetes-pods
    kubernetes_sd_configs:
    - role: pod
      api_server: https://192.168.179.102:6443
      bearer_token_file: /usr/local/prometheus/token.k8s
      tls_config:
        insecure_skip_verify: true
    bearer_token_file: /usr/local/prometheus/token.k8s
    tls_config:
      insecure_skip_verify: true
    # 重命名采集目标协议
    relabel_configs:
    - action: keep
      regex: true
      source_labels:
      - __meta_kubernetes_pod_annotation_prometheus_io_scrape
    # 重命名采集目标指标URL路径
    - action: replace
      regex: (.+)
      source_labels:
      - __meta_kubernetes_pod_annotation_prometheus_io_path
      target_label: __metrics_path__
    # 重命名采集目标地址
    - action: replace
      regex: ([^:]+)(?::\d+)?;(\d+)
      replacement: $1:$2
      source_labels:
      - __address__
      - __meta_kubernetes_pod_annotation_prometheus_io_port
      target_label: __address__
    # 将K8s标签(.*)作为新标签名,原有值不变
    - action: labelmap
      regex: __meta_kubernetes_pod_label_(.+)
    # 生成命名空间标签
    - action: replace
      source_labels:
      - __meta_kubernetes_namespace
      target_label: kubernetes_namespace
    # 生成Service名称标签
    - action: replace
      source_labels:
      - __meta_kubernetes_pod_name
      target_label: kubernetes_pod_name

服务发现类型为endpoints,从endpoints获取pod纳入监控

    kubernetes_sd_configs:
    - role: endpoints

在k8上这几个地址都可以在k8s节点上访问 

​[root@k8s-master ~]# curl http://10.244.36.73:9153/metrics
# HELP coredns_build_info A metric with a constant '1' value labeled by version, revision, and goversion from which CoreDNS was built.
# TYPE coredns_build_info gauge
coredns_build_info{goversion="go1.14.4",revision="f59c03d",version="1.7.0"} 1
# HELP coredns_cache_entries The number of elements in the cache.
# TYPE coredns_cache_entries gauge
coredns_cache_entries{server="dns://:53",type="denial"} 1
coredns_cache_entries{server="dns://:53",type="success"} 0

​

 这里普罗米修斯的网络要和K8s的节点在同一个网络,因为这个ip是pod的ip,我这里由于普罗米修斯和k8s pod不在一个网络,所以显示down,如果普罗米修斯部署在k8s就不存在这个问题

如果在K8s集群当中部署普罗米修斯就不存在这个问题,因为网络互通,这个需要配置相关的路由规则实现的

采集思路就是上面这些,最后用grafana去展示你的数据即可 

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐