Kubernetes实录系列记录文档完整目录参考: Kubernetes实录-目录

相关记录链接地址 :

上一篇 记录了我所在场景的监控需求以及方案架构,本篇记录该方案下Kubernetes集群内部prometheus子系统的各种用来收集监控数据的metrics客户端服务的部署配置。

1. 通过cAdvisor获取监控指标

cAdvisor是Kubernetes的生态中为容器监控数据采集的Agent,已经集成到Kubernetes中不需要单独部署了。

Kubernetes 1.7.3之前,cAdvisor的metrics数据集成在kubelet的metrics中,通过节点开放的4194端口获取数据
Kubernetes 1.7.3之后,cAdvisor的metrics被从kubelet的metrics独立出来了,在prometheus采集的时候变成两个scrape的job。网上很多文档记录都说在node节点会开放4194端口,可以通过该端口获取cAdvisor的metrics数据,新版本kubelet中的cadvisor没有对外开放4194端口,只能通过apiserver提供的api做代理获取监控指标metrics。

  • cAdvisor收集的监控指标类型
    cAdvisor能够获取当前节点上运行的所有容器的资源使用情况。监控指标key的前缀是container_*
    container_cpu_*
    container_fs_*
    container_memeory_*
    container_network_*
    container_spec_*
    container_last_seen,  
    container_scrape_error, 
    container_start_seconds, 
    container_tasks_state
    
  • API
    项目APIprometheus配置备注
    cAdvisor的metric/api/v1/nodes/{node}/proxy/metrics/cadvisorreplacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
  • cAdvisor 获取metrics接口测试
    kubectl get --raw "/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor"
    
    或者如下方式:
    
    kubectl proxy --port=6080
    curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor
    		
    	# HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision.
    	# TYPE cadvisor_version_info gauge
    	cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="18.06.1-ce",kernelVersion="3.10.0-862.el7.x86_64",osVersion="CentOS Linux 7 (Core)"} 1
    	# HELP container_cpu_cfs_periods_total Number of elapsed enforcement period intervals.
    	# TYPE container_cpu_cfs_periods_total counter
    	container_cpu_cfs_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849686
    	container_cpu_cfs_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849710
    	# HELP container_cpu_cfs_throttled_periods_total Number of throttled period intervals.
    	# TYPE container_cpu_cfs_throttled_periods_total counter
    	container_cpu_cfs_throttled_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10576
    	container_cpu_cfs_throttled_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10266
    	# HELP container_cpu_cfs_throttled_seconds_total Total time duration the container has been throttled.
    	# TYPE container_cpu_cfs_throttled_seconds_total counter
    	container_cpu_cfs_throttled_seconds_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 16523.995575912
    	container_cpu_cfs_throttled_seconds_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10673.627579073
    
    	... ... ... ... #省略很多
    

2. 通过kubelet获取监控指标

  • kubelet收集的监控指标类型
    待补充
  • API
    项目APIprometheus配置备注
    kubelnet的metric/api/v1/nodes/{node}/proxy/metricsreplacement: /api/v1/nodes/${1}/proxy/metrics
  • kubelet 获取metrics接口测试
    kubectl proxy --port=6080
    curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics
    	# HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend.
    	# TYPE apiserver_audit_event_total counter
    	apiserver_audit_event_total 0
    	# HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend.
    	# TYPE apiserver_audit_requests_rejected_total counter
    	apiserver_audit_requests_rejected_total 0
    	... ... ... ... #省略很多
    	# HELP apiserver_storage_data_key_generation_latencies_microseconds Latencies in microseconds of data encryption key(DEK) generation operations.
    	# TYPE apiserver_storage_data_key_generation_latencies_microseconds histogram
    	apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="5"} 0
    	apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="10"} 0
    	apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="20"} 0
    	apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="40"} 0
    
    	... ... ... ... #省略很多
    

3. node_exporter

Prometheus提供的NodeExporter项目可以提取主机节点的关键度量指标,通过Kubernetes的DeamonSet模式可以在各主机节点上部署一个NodeExporter实例,实现对主机性能指标数据的监控。

  • 定义文件 prometheus-node-exporter-daemonset.yaml
    cat prometheus-node-exporter-daemonset.yaml 
    
    apiVersion: extensions/v1beta1
    kind: DaemonSet
    metadata:
      name: prometheus-node-exporter
      namespace: kube-system
      labels:
        app: prometheus-node-exporter
    spec:
      template:
        metadata:
          name: prometheus-node-exporter
          labels:
            app: prometheus-node-exporter
        spec:
          containers:
          - image: prom/node-exporter:v0.17.0
            imagePullPolicy: IfNotPresent
            name: prometheus-node-exporter
            ports:
            - name: prom-node-exp
              #^ must be an IANA_SVC_NAME (at most 15 characters, ..)
              containerPort: 9100
              hostPort: 9100
          tolerations:
          - key: "node-role.kubernetes.io/master"
            effect: "NoSchedule"
          hostNetwork: true
          hostPID: true
          hostIPC: true
          restartPolicy: Always
    ---
    apiVersion: v1
    kind: Service
    metadata:
      annotations:
        prometheus.io/scrape: 'true'
        prometheus.io/app-metrics: 'true'
        prometheus.io/app-metrics-path: '/metrics'
      name: prometheus-node-exporter
      namespace: kube-system
      labels:
        app: prometheus-node-exporter
    spec:
      clusterIP: None
      ports:
        - name: prometheus-node-exporter
          port: 9100
          protocol: TCP
      selector:
        app: prometheus-node-exporter
      type: ClusterIP
    
  • 配置指令
    kubectl apply -f prometheus-node-exporter-daemonset.yaml 
        daemonset.extensions/prometheus-node-exporter created
        service/prometheus-node-exporter created
    
    kubectl get -f prometheus-node-exporter-daemonset.yaml 
    	NAME                                            DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
    	daemonset.extensions/prometheus-node-exporter   6         6         6       6            6           <none>          5m
    	
    	NAME                               TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)    AGE
    	service/prometheus-node-exporter   ClusterIP   None         <none>        9100/TCP   5m
    
  • 验证
    任意主机节点上:
    netstat -pltn |grep 9100
    	tcp6       0      0 :::9100                 :::*                    LISTEN      104168/node_exporte 
    
    curl {nodeIP}:9100/metrics
    	# HELP go_gc_duration_seconds A summary of the GC invocation durations.
    	# TYPE go_gc_duration_seconds summary
    	go_gc_duration_seconds{quantile="0"} 0.000117217
    	go_gc_duration_seconds{quantile="0.25"} 0.000159431
    	go_gc_duration_seconds{quantile="0.5"} 0.000200323
    	.............#省略很多
    

4. kube-state-metrics

kube-state-metrics收集Kubernetes集群内部资源对象的监控指标,资源对象包括daemonset,deployment,job,namespace,node,pvc,pod_container,pod,replicaset,service,statefulset。

  • kube-state-metrics部署
    1. 部署文件下载
      kube-state-metrics的部署定义文件,github下载地址:kube-state-metrics,当前最新版本1.5.0
      mkdir kube-state-metrics
      cd kube-state-metrics
      wget https://github.com/kubernetes/kube-state-metrics/archive/v1.5.0.zip
      unzip v1.5.0.zip
      cd kube-state-metrics-1.5.0/kubernetes/
      tree
      ├── kube-state-metrics-cluster-role-binding.yaml
      ├── kube-state-metrics-cluster-role.yaml
      ├── kube-state-metrics-deployment.yaml
      ├── kube-state-metrics-role-binding.yaml
      ├── kube-state-metrics-role.yaml
      ├── kube-state-metrics-service-account.yaml
      └── kube-state-metrics-service.yaml
      
    2. 部署文件修改
      kube-state-metrics默认部署在kube-system命名空间下,如果需要部署在其他namespace下,可以进行定制修改。
      kube-state-metrics-deployment.yaml文件中引用的2个docker image地址需要梯子,可以改成国内可以访问的镜像源
      image: quay.io/coreos/kube-state-metrics:v1.5.0
      修改为:
      mirrorgooglecontainers/kube-state-metrics:v1.5.0
      
      k8s.gcr.io/addon-resizer:1.8.3
      修改为(最新版本):
      mirrorgooglecontainers/addon-resizer:1.8.4
      
    3. 部署
      kubectl apply -f ./
      	clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged
      	clusterrole.rbac.authorization.k8s.io/kube-state-metrics unchanged
      	deployment.apps/kube-state-metrics created
      	rolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged
      	role.rbac.authorization.k8s.io/kube-state-metrics-resizer unchanged
      	serviceaccount/kube-state-metrics unchanged
      	service/kube-state-metrics unchanged
      kubectl get -f ./
      	NAME                                                              AGE
      	clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics   70s
      	
      	NAME                                                       AGE
      	clusterrole.rbac.authorization.k8s.io/kube-state-metrics   70s
      	
      	NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
      	deployment.apps/kube-state-metrics   1/1     1            1           50s
      	
      	NAME                                                       AGE
      	rolebinding.rbac.authorization.k8s.io/kube-state-metrics   70s
      	
      	NAME                                                        AGE
      	role.rbac.authorization.k8s.io/kube-state-metrics-resizer   70s
      	
      	NAME                                SECRETS   AGE
      	serviceaccount/kube-state-metrics   1         70s
      	
      	NAME                         TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
      	service/kube-state-metrics   ClusterIP   10.106.107.13   <none>        8080/TCP,8081/TCP   70s
      
  • 通过kube-state-metrics获取metrics接口测试
    kubectl get svc kube-state-metrics -n kube-system 
    	NAME                 TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)             AGE
    	kube-state-metrics   ClusterIP   10.106.107.13   <none>        8080/TCP,8081/TCP   5m33s
    
    curl 10.106.107.13:8080/metrics
    	# HELP kube_configmap_info Information about configmap.
    	# TYPE kube_configmap_info gauge
    	kube_configmap_info{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1
    	kube_configmap_info{namespace="kube-system",configmap="prometheus-config"} 1
    	kube_configmap_info{namespace="kube-system",configmap="kubeadm-config"} 1
    	kube_configmap_info{namespace="kube-public",configmap="cluster-info"} 1
    	kube_configmap_info{namespace="kube-system",configmap="calico-config"} 1
    	kube_configmap_info{namespace="kube-system",configmap="coredns"} 1
    	kube_configmap_info{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
    	kube_configmap_info{namespace="kube-system",configmap="kube-proxy"} 1
    	kube_configmap_info{namespace="kube-system",configmap="kubelet-config-1.13"} 1
    	
    	... ... #省略很多
    
    curl -I 10.106.107.13:8081/healthz
    	HTTP/1.1 200 OK
    	Date: Wed, 20 Feb 2019 02:20:10 GMT
    	Content-Length: 264
    	Content-Type: text/html; charset=utf-8
    

5. blackbox-exporter

blackbox-exporter是一个黑盒探测工具,可以对服务的http、tcp、icmp等进行网络探测。github地址 blackbox-exporter,当前最新版本:v0.13.0

  • 部署定义文件

    cat blackbox-exporter-configmap.yaml
    
    apiVersion: v1
    kind: ConfigMap
    metadata:
      labels:
        app: blackbox-exporter
      name: blackbox-exporter
      namespace: kube-system
    data:
      blackbox.yml: |-
        modules:
          http_2xx:
            prober: http
            timeout: 10s
            http:
              valid_http_versions: ["HTTP/1.1", "HTTP/2"]
              valid_status_codes: []
              method: GET
              preferred_ip_protocol: "ip4"
          http_post_2xx: 
            prober: http
            timeout: 10s
            http:
              valid_http_versions: ["HTTP/1.1", "HTTP/2"]
              method: POST
              preferred_ip_protocol: "ip4"
          tcp_connect:
            prober: tcp
            timeout: 10s
          icmp:
            prober: icmp
            timeout: 10s
            icmp:
              preferred_ip_protocol: "ip4"
    
    cat blackbox-exporter-deployment.yaml
    
    apiVersion: apps/v1
    kind: Deployment
    metadata:
      name: blackbox-exporter
      namespace: kube-system
    spec:
      selector:
        matchLabels:
          app: blackbox-exporter
      replicas: 1
      template:
        metadata:
          labels:
            app: blackbox-exporter
        spec:
          restartPolicy: Always
          containers:
          - name: blackbox-exporter
            image: prom/blackbox-exporter:v0.13.0
            imagePullPolicy: IfNotPresent
            ports:
            - name: blackbox-port
              containerPort: 9115
            readinessProbe:
              tcpSocket:
                port: 9115
              initialDelaySeconds: 5
              timeoutSeconds: 5
            resources:
              requests:
                memory: 50Mi
                cpu: 100m
              limits:
                memory: 60Mi
                cpu: 200m
            volumeMounts:
            - name: config
              mountPath: /etc/blackbox_exporter
            args:
            - --config.file=/etc/blackbox_exporter/blackbox.yml
            - --log.level=debug
            - --web.listen-address=:9115
          volumes:
          - name: config
            configMap:
              name: blackbox-exporter
          nodeSelector:
            node-role.kubernetes.io/master: ""
          tolerations:
          - key: "node-role.kubernetes.io/master"
          	operator: "Equal"
          	value: ""
            effect: "NoSchedule"
    ---
    apiVersion: v1
    kind: Service
    metadata:
      labels:
        app: blackbox-exporter
      name: blackbox-exporter
      namespace: kube-system
      annotations:
        prometheus.io/scrape: 'true'
    spec:
      type: ClusterIP
      selector:
        app: blackbox-exporter
      ports:
      - name: blackbox
        port: 9115
        targetPort: 9115
        protocol: TCP
    
  • 部署

    kubectl apply -f blackbox-exporter-configmap.yaml
    	configmap/blackbox-exporter created
    
    kubectl apply -f blackbox-exporter-deployment.yaml 
    	deployment.apps/blackbox-exporter created
    	service/blackbox-exporter created
    
    kubectl get -f blackbox-exporter-configmap.yaml 
    	NAME                DATA   AGE
    	blackbox-exporter   1      3m12s
    
    kubectl get -f blackbox-exporter-deployment.yaml 
    	NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
    	deployment.apps/blackbox-exporter   1/1     1            1           69s
    
    	NAME                        TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
    	service/blackbox-exporter   ClusterIP   10.109.48.146   <none>        9115/TCP   69s
    
  • 测试验证

    # 验证(1),验证服务示范正常
    curl 10.109.48.146:9115
    	<html>
    	    <head><title>Blackbox Exporter</title></head>
    	    <body>
    	    <h1>Blackbox Exporter</h1>
    	    <p><a href="/probe?target=prometheus.io&module=http_2xx">Probe prometheus.io for http_2xx</a></p>
    	    <p><a href="/probe?target=prometheus.io&module=http_2xx&debug=true">Debug probe prometheus.io for http_2xx</a></p>
    	    <p><a href="/metrics">Metrics</a></p>
    	    <p><a href="/config">Configuration</a></p>
    	    <h2>Recent Probes</h2>
    	    <table border='1'><tr><th>Module</th><th>Target</th><th>Result</th><th>Debug</th></table></body>
        </html>
    
    # 验证(2)验证tcp探测,以grafana举例
    kubectl get svc -n kube-system  -l app=blackbox-exporter
    	NAME                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)    AGE
    	blackbox-exporter   ClusterIP   10.109.48.146   <none>        9115/TCP   27h
    kubectl describe svc monitoring-grafana -n kube-system  
    	Name:              monitoring-grafana
    	Namespace:         kube-system
    	Labels:            <none>
    	Annotations:       kubectl.kubernetes.io/last-applied-configuration:
    	                     {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true","prometheus.io/tcp-probe":"true","prometheus....
    	                   prometheus.io/scrape: true
    	                   prometheus.io/tcp-probe: true
    	                   prometheus.io/tcp-probe-port: 80
    	Selector:          k8s-app=grafana
    	Type:              ClusterIP
    	IP:                10.99.65.209
    	Port:              grafana  80/TCP
    	TargetPort:        3000/TCP
    	Endpoints:         192.168.1.6:3000
    	Session Affinity:  None
    	Events:            <none>	
    
    curl '10.109.48.146:9115/probe?module=tcp_connect&target=monitoring-grafana.kube-system:80'
    	# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
    	# TYPE probe_dns_lookup_time_seconds gauge
    	probe_dns_lookup_time_seconds 0.002059111
    	# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
    	# TYPE probe_duration_seconds gauge
    	probe_duration_seconds 0.002815779
    	# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
    	# TYPE probe_failed_due_to_regex gauge
    	probe_failed_due_to_regex 0
    	# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
    	# TYPE probe_ip_protocol gauge
    	probe_ip_protocol 4
    	# HELP probe_success Displays whether or not the probe was a success
    	# TYPE probe_success gauge
    	probe_success 1
    

到这里,监控Kubernetes集群的相关exporter已经配置完成,下一步就是部署prometheus收集这些exporter的监控指标。

Logo

开源、云原生的融合云平台

更多推荐