Kubernetes实录-第一篇-集群部署配置(31) Kubernetes监控方案-使用prometheus实现全栈监控(2)-集成部署各种metrics客户端服务
学习使用容器平台Kubernetes一直是用到哪里学习到哪里,一直没有进行文档整理,每次需要用到的时候还需要翻各种文档进行整合,趁着现在有些时间将一些学习内容整理记录下来,希望完成2个目标:1. 记录下自己学习K8S的过程,为之后的学习使用差缺补漏2. 如果能为新接触K8S的朋友提供些帮助那就更好了。Kubernetes实录系列记录文档完整目录参考: Kubernetes实录-目录本篇记...
Kubernetes实录系列记录文档完整目录参考: Kubernetes实录-目录
相关记录链接地址 :
上一篇 记录了我所在场景的监控需求以及方案架构,本篇记录该方案下Kubernetes集群内部prometheus子系统的各种用来收集监控数据的metrics客户端服务的部署配置。
1. 通过cAdvisor获取监控指标
cAdvisor是Kubernetes的生态中为容器监控数据采集的Agent,已经集成到Kubernetes中不需要单独部署了。
Kubernetes 1.7.3之前,cAdvisor的metrics数据集成在kubelet的metrics中,通过节点开放的4194端口获取数据
Kubernetes 1.7.3之后,cAdvisor的metrics被从kubelet的metrics独立出来了,在prometheus采集的时候变成两个scrape的job。网上很多文档记录都说在node节点会开放4194端口,可以通过该端口获取cAdvisor的metrics数据,新版本kubelet中的cadvisor没有对外开放4194端口,只能通过apiserver提供的api做代理获取监控指标metrics。
- cAdvisor收集的监控指标类型
cAdvisor能够获取当前节点上运行的所有容器的资源使用情况。监控指标key的前缀是container_*container_cpu_* container_fs_* container_memeory_* container_network_* container_spec_* container_last_seen, container_scrape_error, container_start_seconds, container_tasks_state
- API
项目 API prometheus配置 备注 cAdvisor的metric /api/v1/nodes/{node}/proxy/metrics/cadvisor replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor - cAdvisor 获取metrics接口测试
kubectl get --raw "/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor" 或者如下方式: kubectl proxy --port=6080 curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics/cadvisor # HELP cadvisor_version_info A metric with a constant '1' value labeled by kernel version, OS version, docker version, cadvisor version & cadvisor revision. # TYPE cadvisor_version_info gauge cadvisor_version_info{cadvisorRevision="",cadvisorVersion="",dockerVersion="18.06.1-ce",kernelVersion="3.10.0-862.el7.x86_64",osVersion="CentOS Linux 7 (Core)"} 1 # HELP container_cpu_cfs_periods_total Number of elapsed enforcement period intervals. # TYPE container_cpu_cfs_periods_total counter container_cpu_cfs_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849686 container_cpu_cfs_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 849710 # HELP container_cpu_cfs_throttled_periods_total Number of throttled period intervals. # TYPE container_cpu_cfs_throttled_periods_total counter container_cpu_cfs_throttled_periods_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10576 container_cpu_cfs_throttled_periods_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10266 # HELP container_cpu_cfs_throttled_seconds_total Total time duration the container has been throttled. # TYPE container_cpu_cfs_throttled_seconds_total counter container_cpu_cfs_throttled_seconds_total{container_name="",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice",image="",name="",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 16523.995575912 container_cpu_cfs_throttled_seconds_total{container_name="prometheus",id="/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-poda5666056_3349_11e9_a0fa_14187763656c.slice/docker-58c340d3f0bbff761882a8a53127dc4e95b962514106748baa4a7dc01129962b.scope",image="sha256:690f4cf8dee25c239aa517a16dd73392ecd81485b29cad881d901a99b5b1a303",name="k8s_prometheus_prometheus-576b4fb6bb-4947p_kube-system_a5666056-3349-11e9-a0fa-14187763656c_0",namespace="kube-system",pod_name="prometheus-576b4fb6bb-4947p"} 10673.627579073 ... ... ... ... #省略很多
2. 通过kubelet获取监控指标
- kubelet收集的监控指标类型
待补充 - API
项目 API prometheus配置 备注 kubelnet的metric /api/v1/nodes/{node}/proxy/metrics replacement: /api/v1/nodes/${1}/proxy/metrics - kubelet 获取metrics接口测试
kubectl proxy --port=6080 curl http://localhost:6080/api/v1/nodes/ejucsnode-shqs-1/proxy/metrics # HELP apiserver_audit_event_total Counter of audit events generated and sent to the audit backend. # TYPE apiserver_audit_event_total counter apiserver_audit_event_total 0 # HELP apiserver_audit_requests_rejected_total Counter of apiserver requests rejected due to an error in audit logging backend. # TYPE apiserver_audit_requests_rejected_total counter apiserver_audit_requests_rejected_total 0 ... ... ... ... #省略很多 # HELP apiserver_storage_data_key_generation_latencies_microseconds Latencies in microseconds of data encryption key(DEK) generation operations. # TYPE apiserver_storage_data_key_generation_latencies_microseconds histogram apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="5"} 0 apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="10"} 0 apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="20"} 0 apiserver_storage_data_key_generation_latencies_microseconds_bucket{le="40"} 0 ... ... ... ... #省略很多
3. node_exporter
Prometheus提供的NodeExporter项目可以提取主机节点的关键度量指标,通过Kubernetes的DeamonSet模式可以在各主机节点上部署一个NodeExporter实例,实现对主机性能指标数据的监控。
- 定义文件 prometheus-node-exporter-daemonset.yaml
cat prometheus-node-exporter-daemonset.yaml apiVersion: extensions/v1beta1 kind: DaemonSet metadata: name: prometheus-node-exporter namespace: kube-system labels: app: prometheus-node-exporter spec: template: metadata: name: prometheus-node-exporter labels: app: prometheus-node-exporter spec: containers: - image: prom/node-exporter:v0.17.0 imagePullPolicy: IfNotPresent name: prometheus-node-exporter ports: - name: prom-node-exp #^ must be an IANA_SVC_NAME (at most 15 characters, ..) containerPort: 9100 hostPort: 9100 tolerations: - key: "node-role.kubernetes.io/master" effect: "NoSchedule" hostNetwork: true hostPID: true hostIPC: true restartPolicy: Always --- apiVersion: v1 kind: Service metadata: annotations: prometheus.io/scrape: 'true' prometheus.io/app-metrics: 'true' prometheus.io/app-metrics-path: '/metrics' name: prometheus-node-exporter namespace: kube-system labels: app: prometheus-node-exporter spec: clusterIP: None ports: - name: prometheus-node-exporter port: 9100 protocol: TCP selector: app: prometheus-node-exporter type: ClusterIP
- 配置指令
kubectl apply -f prometheus-node-exporter-daemonset.yaml daemonset.extensions/prometheus-node-exporter created service/prometheus-node-exporter created kubectl get -f prometheus-node-exporter-daemonset.yaml NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE daemonset.extensions/prometheus-node-exporter 6 6 6 6 6 <none> 5m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/prometheus-node-exporter ClusterIP None <none> 9100/TCP 5m
- 验证
任意主机节点上: netstat -pltn |grep 9100 tcp6 0 0 :::9100 :::* LISTEN 104168/node_exporte curl {nodeIP}:9100/metrics # HELP go_gc_duration_seconds A summary of the GC invocation durations. # TYPE go_gc_duration_seconds summary go_gc_duration_seconds{quantile="0"} 0.000117217 go_gc_duration_seconds{quantile="0.25"} 0.000159431 go_gc_duration_seconds{quantile="0.5"} 0.000200323 .............#省略很多
4. kube-state-metrics
kube-state-metrics收集Kubernetes集群内部资源对象的监控指标,资源对象包括daemonset,deployment,job,namespace,node,pvc,pod_container,pod,replicaset,service,statefulset。
- kube-state-metrics部署
- 部署文件下载
kube-state-metrics的部署定义文件,github下载地址:kube-state-metrics,当前最新版本1.5.0mkdir kube-state-metrics cd kube-state-metrics wget https://github.com/kubernetes/kube-state-metrics/archive/v1.5.0.zip unzip v1.5.0.zip cd kube-state-metrics-1.5.0/kubernetes/ tree ├── kube-state-metrics-cluster-role-binding.yaml ├── kube-state-metrics-cluster-role.yaml ├── kube-state-metrics-deployment.yaml ├── kube-state-metrics-role-binding.yaml ├── kube-state-metrics-role.yaml ├── kube-state-metrics-service-account.yaml └── kube-state-metrics-service.yaml
- 部署文件修改
kube-state-metrics默认部署在kube-system命名空间下,如果需要部署在其他namespace下,可以进行定制修改。
kube-state-metrics-deployment.yaml文件中引用的2个docker image地址需要梯子,可以改成国内可以访问的镜像源image: quay.io/coreos/kube-state-metrics:v1.5.0 修改为: mirrorgooglecontainers/kube-state-metrics:v1.5.0 k8s.gcr.io/addon-resizer:1.8.3 修改为(最新版本): mirrorgooglecontainers/addon-resizer:1.8.4
- 部署
kubectl apply -f ./ clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged clusterrole.rbac.authorization.k8s.io/kube-state-metrics unchanged deployment.apps/kube-state-metrics created rolebinding.rbac.authorization.k8s.io/kube-state-metrics unchanged role.rbac.authorization.k8s.io/kube-state-metrics-resizer unchanged serviceaccount/kube-state-metrics unchanged service/kube-state-metrics unchanged kubectl get -f ./ NAME AGE clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics 70s NAME AGE clusterrole.rbac.authorization.k8s.io/kube-state-metrics 70s NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/kube-state-metrics 1/1 1 1 50s NAME AGE rolebinding.rbac.authorization.k8s.io/kube-state-metrics 70s NAME AGE role.rbac.authorization.k8s.io/kube-state-metrics-resizer 70s NAME SECRETS AGE serviceaccount/kube-state-metrics 1 70s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/kube-state-metrics ClusterIP 10.106.107.13 <none> 8080/TCP,8081/TCP 70s
- 部署文件下载
- 通过kube-state-metrics获取metrics接口测试
kubectl get svc kube-state-metrics -n kube-system NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-state-metrics ClusterIP 10.106.107.13 <none> 8080/TCP,8081/TCP 5m33s curl 10.106.107.13:8080/metrics # HELP kube_configmap_info Information about configmap. # TYPE kube_configmap_info gauge kube_configmap_info{namespace="kube-system",configmap="kubernetes-dashboard-settings"} 1 kube_configmap_info{namespace="kube-system",configmap="prometheus-config"} 1 kube_configmap_info{namespace="kube-system",configmap="kubeadm-config"} 1 kube_configmap_info{namespace="kube-public",configmap="cluster-info"} 1 kube_configmap_info{namespace="kube-system",configmap="calico-config"} 1 kube_configmap_info{namespace="kube-system",configmap="coredns"} 1 kube_configmap_info{namespace="kube-system",configmap="extension-apiserver-authentication"} 1 kube_configmap_info{namespace="kube-system",configmap="kube-proxy"} 1 kube_configmap_info{namespace="kube-system",configmap="kubelet-config-1.13"} 1 ... ... #省略很多 curl -I 10.106.107.13:8081/healthz HTTP/1.1 200 OK Date: Wed, 20 Feb 2019 02:20:10 GMT Content-Length: 264 Content-Type: text/html; charset=utf-8
5. blackbox-exporter
blackbox-exporter是一个黑盒探测工具,可以对服务的http、tcp、icmp等进行网络探测。github地址 blackbox-exporter,当前最新版本:v0.13.0
-
部署定义文件
cat blackbox-exporter-configmap.yaml apiVersion: v1 kind: ConfigMap metadata: labels: app: blackbox-exporter name: blackbox-exporter namespace: kube-system data: blackbox.yml: |- modules: http_2xx: prober: http timeout: 10s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] valid_status_codes: [] method: GET preferred_ip_protocol: "ip4" http_post_2xx: prober: http timeout: 10s http: valid_http_versions: ["HTTP/1.1", "HTTP/2"] method: POST preferred_ip_protocol: "ip4" tcp_connect: prober: tcp timeout: 10s icmp: prober: icmp timeout: 10s icmp: preferred_ip_protocol: "ip4"
cat blackbox-exporter-deployment.yaml apiVersion: apps/v1 kind: Deployment metadata: name: blackbox-exporter namespace: kube-system spec: selector: matchLabels: app: blackbox-exporter replicas: 1 template: metadata: labels: app: blackbox-exporter spec: restartPolicy: Always containers: - name: blackbox-exporter image: prom/blackbox-exporter:v0.13.0 imagePullPolicy: IfNotPresent ports: - name: blackbox-port containerPort: 9115 readinessProbe: tcpSocket: port: 9115 initialDelaySeconds: 5 timeoutSeconds: 5 resources: requests: memory: 50Mi cpu: 100m limits: memory: 60Mi cpu: 200m volumeMounts: - name: config mountPath: /etc/blackbox_exporter args: - --config.file=/etc/blackbox_exporter/blackbox.yml - --log.level=debug - --web.listen-address=:9115 volumes: - name: config configMap: name: blackbox-exporter nodeSelector: node-role.kubernetes.io/master: "" tolerations: - key: "node-role.kubernetes.io/master" operator: "Equal" value: "" effect: "NoSchedule" --- apiVersion: v1 kind: Service metadata: labels: app: blackbox-exporter name: blackbox-exporter namespace: kube-system annotations: prometheus.io/scrape: 'true' spec: type: ClusterIP selector: app: blackbox-exporter ports: - name: blackbox port: 9115 targetPort: 9115 protocol: TCP
-
部署
kubectl apply -f blackbox-exporter-configmap.yaml configmap/blackbox-exporter created kubectl apply -f blackbox-exporter-deployment.yaml deployment.apps/blackbox-exporter created service/blackbox-exporter created kubectl get -f blackbox-exporter-configmap.yaml NAME DATA AGE blackbox-exporter 1 3m12s kubectl get -f blackbox-exporter-deployment.yaml NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/blackbox-exporter 1/1 1 1 69s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/blackbox-exporter ClusterIP 10.109.48.146 <none> 9115/TCP 69s
-
测试验证
# 验证(1),验证服务示范正常 curl 10.109.48.146:9115 <html> <head><title>Blackbox Exporter</title></head> <body> <h1>Blackbox Exporter</h1> <p><a href="/probe?target=prometheus.io&module=http_2xx">Probe prometheus.io for http_2xx</a></p> <p><a href="/probe?target=prometheus.io&module=http_2xx&debug=true">Debug probe prometheus.io for http_2xx</a></p> <p><a href="/metrics">Metrics</a></p> <p><a href="/config">Configuration</a></p> <h2>Recent Probes</h2> <table border='1'><tr><th>Module</th><th>Target</th><th>Result</th><th>Debug</th></table></body> </html> # 验证(2)验证tcp探测,以grafana举例 kubectl get svc -n kube-system -l app=blackbox-exporter NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE blackbox-exporter ClusterIP 10.109.48.146 <none> 9115/TCP 27h kubectl describe svc monitoring-grafana -n kube-system Name: monitoring-grafana Namespace: kube-system Labels: <none> Annotations: kubectl.kubernetes.io/last-applied-configuration: {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{"prometheus.io/scrape":"true","prometheus.io/tcp-probe":"true","prometheus.... prometheus.io/scrape: true prometheus.io/tcp-probe: true prometheus.io/tcp-probe-port: 80 Selector: k8s-app=grafana Type: ClusterIP IP: 10.99.65.209 Port: grafana 80/TCP TargetPort: 3000/TCP Endpoints: 192.168.1.6:3000 Session Affinity: None Events: <none> curl '10.109.48.146:9115/probe?module=tcp_connect&target=monitoring-grafana.kube-system:80' # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds # TYPE probe_dns_lookup_time_seconds gauge probe_dns_lookup_time_seconds 0.002059111 # HELP probe_duration_seconds Returns how long the probe took to complete in seconds # TYPE probe_duration_seconds gauge probe_duration_seconds 0.002815779 # HELP probe_failed_due_to_regex Indicates if probe failed due to regex # TYPE probe_failed_due_to_regex gauge probe_failed_due_to_regex 0 # HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6 # TYPE probe_ip_protocol gauge probe_ip_protocol 4 # HELP probe_success Displays whether or not the probe was a success # TYPE probe_success gauge probe_success 1
到这里,监控Kubernetes集群的相关exporter已经配置完成,下一步就是部署prometheus收集这些exporter的监控指标。
更多推荐
所有评论(0)