一、Prometheus自动发现机制

如果被监控目标基于k8s,那么被监控目标将会非常多,而且目标对象更改频率也非常高,这就导致添加监控目标非常繁琐。引入服务发现机制就是为了实现自动将被监控目标添加到Prometheus里。
Prometheus数据源的配置主要分为静态配置和动态发现, 常用的为以下几类:
static_configs: #静态服务发现,即将配置直接写到配置文件里或者Configmap里
file_sd_configs: #文件服务发现,创建一个专门配置target的配置文件,新增监控对象时直接修改那个专门的文件即可
dns_sd_configs: DNS #服务发现
kubernetes_sd_configs: #Kubernetes 服务发现
consul_sd_configs: # Consul 服务发现

在监控kubernetes的应用场景中,频繁更新的pod,svc,等等资源配置应该是最能体Prometheus监控目标自动发现服务的好处。下面我们使用consul的形式来实现自动发现:

1.在k8s里起一个consul服务

使用helm安装,由于consul涉及到了数据持久化,需要先将包下载下来,并修改values.yaml

helm pull bitnami/consul --untar 

声明:假设你已经配制好数据持久化,我这里使用我早期配置的NFS的StorageClass(nfs-client)

修改values.yaml

cd consul
vi values.yaml #搜索storageClass,改为
storageClass: "nfs-client" ##有几个就改几个 

helm安装

helm install prometheus-consul .                  

[root@aminglinux01 consul]# helm install prometheus-consul .
NAME: prometheus-consul
LAST DEPLOYED: Sat Aug  3 03:42:41 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: consul
CHART VERSION: 11.3.11
APP VERSION: 1.19.1

  ** Please be patient while the chart is being deployed **

  Consul can be accessed within the cluster on port 8300 at prometheus-consul-headless.default.svc.cluster.local

In order to access to the Consul Web UI:

1. Get the Consul URL by running these commands:

    kubectl port-forward --namespace default svc/prometheus-consul-ui 80:80
    echo "Consul URL: http://127.0.0.1:80"

2. Access ASP.NET Core using the obtained URL.

Please take into account that you need to wait until a cluster leader is elected before using the Consul Web UI.

In order to check the status of the cluster you can run the following command:

    kubectl exec -it prometheus-consul-0 -- consul members

Furthermore, to know which Consul node is the cluster leader run this other command:

    kubectl exec -it prometheus-consul-0 -- consul operator raft list-peers

WARNING: There are "resources" sections in the chart not set. Using "resourcesPreset" is not recommended for production. For production installations, please set the following values according to your workload needs:
  - resources
+info https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

⚠ SECURITY WARNING: Original containers have been substituted. This Helm chart was designed, tested, and validated on multiple platforms using a specific set of Bitnami and Tanzu Application Catalog containers. Substituting other containers is likely to cause degraded security and performance, broken chart features, and missing environment variables.

Substituted images detected:
  - registry.cn-hangzhou.aliyuncs.com/daliyused/consul:1.19.1-debian-12-r4
  - registry.cn-hangzhou.aliyuncs.com/daliyused/consul-exporter:0.12.0-debian-12-r10
[root@aminglinux01 consul]# 

查看pvc

kubectl get pvc

[root@aminglinux01 consul]# kubectl get pvc
NAME                                 STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
data-prometheus-consul-0             Bound     pvc-092f63a4-59cc-4200-b511-fb7d5a288b42   8Gi        RWO            nfs-client     46s
data-prometheus-consul-1             Bound     pvc-b9f7b89a-a4e9-433b-b03b-32d574145dcf   8Gi        RWO            nfs-client     46s
data-prometheus-consul-2             Bound     pvc-d8a278ec-956d-4f39-bce2-19ff886e8209   8Gi        RWO            nfs-client     46s

查看pod

[root@aminglinux01 consul]# kubectl get pod | grep consul
prometheus-consul-0                    1/1     Running   0              42s
prometheus-consul-1                    1/1     Running   0              42s
prometheus-consul-2                    1/1     Running   0              42s
[root@aminglinux01 consul]# 

查看consul的service

[root@aminglinux01 consul]# kubectl get svc | grep consul
prometheus-consul-headless   ClusterIP      None            <none>           8500/TCP,8400/TCP,8301/TCP,8302/UDP,8302/TCP,8301/UDP,8300/TCP,8600/TCP,8600/UDP   4m53s
prometheus-consul-ui         ClusterIP      10.15.7.56      <none>           80/TCP                                                                             4m53s
[root@aminglinux01 consul]# 
 

通过consul接口注册数据

curl -X PUT -d '{"id": "aminglinux03","name": "aminglinux03","address":"192.168.100.153","port": 9100,"tags": ["service"],"checks": [{"http":"http://192.168.100.153:9100/","interval": "5s"}]}' http://10.15.7.56/v1/agent/service/register

 

查看:

curl 10.15.7.56/v1/catalog/service/aminglinux03

2.将consul配置到Prometheus的配置里

还得去编辑prometheus_config.yaml

vi prometheus_config.yaml ###在 scrape_configs: 下面增加,如下
- job_name: 'consul'
  consul_sd_configs:
      - server: 'prometheus-consul-ui'            ##由于服务重启时svcIP会变,因而直接使用svc名称

重新导入配置

kubectl delete -f prometheus_config.yaml
kubectl apply -f prometheus_config.yaml

[root@aminglinux01 prometheus]# vim prometheus_config.yaml 
[root@aminglinux01 prometheus]# kubectl delete -f prometheus_config.yaml
configmap "prometheus-server" deleted
[root@aminglinux01 prometheus]# kubectl apply -f prometheus_config.yaml
configmap/prometheus-server created
[root@aminglinux01 prometheus]# 

重启Prometheus服务

[root@aminglinux01 prometheus]# kubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}
pod "prometheus-server-bd476698f-vbkjd" deleted

 

3.利用consul,监控Nginx

用已经存在的nginx pod作为示例,需要先到pod里面去配置一下status页:

kubectl exec -it nginx -- bash

由于nginx的镜像内并没有vi工具,所以还需要额外安装个vim(在pod内操作)

apt update
apt install -y vim

编辑配置文件

vim /etc/nginx/conf.d/default.conf ##改为如下,增加到最后面的 } 上面即可
    location /nginx_status {
        stub_status;
    }

重新加载(在pod内操作)

nginx -s reload

root@nginx:/# nginx -s reload
2024/08/02 20:31:33 [notice] 273#273: signal process started
root@nginx:/# 

测试是否生效(在pod内操作)

curl localhost/nginx_status ##显示如下内容,说明成功
Active connections: 1
server accepts handled requests
2 2 2
Reading: 0 Writing: 1 Waiting: 0

root@nginx:/# curl localhost/nginx_status
Active connections: 1 
server accepts handled requests
 1 1 1 
Reading: 0 Writing: 1 Waiting: 0 
root@nginx:/# 

说明:没问题,就可以退出pod了

有了被监控目标,还需要安装一个nginx的exporter

首先用helm下载包(因为需要修改配置,不能直接安装)

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm pull prometheus-community/prometheus-nginx-exporter --untar

[root@aminglinux01 ~]# helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
"prometheus-community" already exists with the same configuration, skipping
[root@aminglinux01 ~]# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "aliyun" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "bitnami" chart repository
...Successfully got an update from the "helm_sh" chart repository
Update Complete. ⎈Happy Helming!⎈
[root@aminglinux01 ~]# helm pull prometheus-community/prometheus-nginx-exporter --untar
Error: Get "https://github.com/prometheus-community/helm-charts/releases/download/prometheus-nginx-exporter-0.2.1/prometheus-nginx-exporter-0.2.1.tgz": unexpected EOF
[root@aminglinux01 ~]# 

获取目标pod的ip

[root@aminglinux01 ~]# kubectl get po -o wide | grep nginx
nginx                                  1/1     Running   0              9m36s   10.18.68.136      aminglinux03   <none>           <none>
[root@aminglinux01 ~]# 

编辑values.yaml

vi values.yaml
将 nginxServer: "http://{{ .Release.Name }}.{{ .Release.Namespace
}}.svc.cluster.local:8080/stub_status" 这行注释,然后再增加一行
nginxServer: "http://10.18.68.136/nginx_status"

安装nginx-exporter

helm install prometheus-nginx-exporter .

[root@aminglinux01 prometheus-nginx-exporter]# helm install prometheus-nginx-exporter .
NAME: prometheus-nginx-exporter
LAST DEPLOYED: Sat Aug  3 04:50:28 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the application URL by running these commands:
  export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=prometheus-nginx-exporter,app.kubernetes.io/instance=prometheus-nginx-exporter" -o jsonpath="{.items[0].metadata.name}")
  export CONTAINER_PORT=$(kubectl get pod --namespace default $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
  echo "Visit http://127.0.0.1:8080 to use your application"
  kubectl --namespace default port-forward $POD_NAME 8080:$CONTAINER_PORT
[root@aminglinux01 prometheus-nginx-exporter]# kubectl get po | grep prometheus-

查看pod状态

[root@aminglinux01 prometheus-nginx-exporter]# kubectl get po | grep prometheus-
prometheus-alertmanager-0                   1/1     Running   0              173m
prometheus-consul-0                         1/1     Running   0              69m
prometheus-consul-1                         1/1     Running   0              69m
prometheus-consul-2                         1/1     Running   0              69m
prometheus-nginx-exporter-bbf5d8b8b-s8hvl   1/1     Running   0              2m1s
prometheus-server-bd476698f-48z94           1/1     Running   0              49m
[root@aminglinux01 prometheus-nginx-exporter]# 

查看svc ip

[root@aminglinux01 prometheus-nginx-exporter]# kubectl get svc | grep prometheus-
prometheus-alertmanager      LoadBalancer   10.15.190.141   192.168.10.244   80:31633/TCP                                                                       173m
prometheus-consul-headless   ClusterIP      None            <none>           8500/TCP,8400/TCP,8301/TCP,8302/UDP,8302/TCP,8301/UDP,8300/TCP,8600/TCP,8600/UDP   70m
prometheus-consul-ui         ClusterIP      10.15.7.56      <none>           80/TCP                                                                             70m
prometheus-nginx-exporter    ClusterIP      10.15.7.72      <none>           9113/TCP                                                                           2m30s
prometheus-server            LoadBalancer   10.15.67.163    192.168.10.243   80:32470/TCP                                                                       173m
[root@aminglinux01 prometheus-nginx-exporter]# 

访问

[root@aminglinux01 prometheus-nginx-exporter]# curl  10.15.7.72:9113/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 12
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.22.5"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 250176
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 250176
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 8154
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 1.418824e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 250176
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 1.417216e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 2.08896e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 1057
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 1.417216e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 3.506176e+06
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 1057
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 4800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 15600
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 54080
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 65280
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.194304e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.202686e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 688128
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 688128
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 6.904848e+06
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 9
# HELP nginx_connections_accepted Accepted client connections
# TYPE nginx_connections_accepted counter
nginx_connections_accepted 2
# HELP nginx_connections_active Active client connections
# TYPE nginx_connections_active gauge
nginx_connections_active 1
# HELP nginx_connections_handled Handled client connections
# TYPE nginx_connections_handled counter
nginx_connections_handled 2
# HELP nginx_connections_reading Connections where NGINX is reading the request header
# TYPE nginx_connections_reading gauge
nginx_connections_reading 0
# HELP nginx_connections_waiting Idle client connections
# TYPE nginx_connections_waiting gauge
nginx_connections_waiting 0
# HELP nginx_connections_writing Connections where NGINX is writing the response back to the client
# TYPE nginx_connections_writing gauge
nginx_connections_writing 1
# HELP nginx_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which nginx_exporter was built, and the goos and goarch for the build.
# TYPE nginx_exporter_build_info gauge
nginx_exporter_build_info{branch="HEAD",goarch="amd64",goos="linux",goversion="go1.22.5",revision="9522f4e39ee1aed817d7d70a89514ccc0ae1594a",tags="unknown",version="1.3.0"} 1

将该metrics地址注册到consul里

curl -X PUT -d '{"id": "nginx","name": "nginx","address":"10.15.7.72","port": 9113,"tags": ["service"],"checks": [{"http":"http://10.15.7.72:9113/","interval": "5s"}]}' http://10.15.7.56/v1/agent/service/register

查看

curl 10.15.7.56/v1/catalog/service/nginx

[root@aminglinux01 prometheus-nginx-exporter]# curl -X PUT -d '{"id": "nginx","name": "nginx","address":"10.15.7.72","port": 9113,"tags": ["service"],"checks": [{"http":"http://10.15.7.72:9113/","interval": "5s"}]}' http://10.15.7.56/v1/agent/service/register
[root@aminglinux01 prometheus-nginx-exporter]# curl 10.15.7.56/v1/catalog/service/nginx
[{"ID":"e0cf3632-3441-37ae-3b07-313aaa22f649","Node":"prometheus-consul-0","Address":"10.18.206.214","Datacenter":"dc1","TaggedAddresses":{"lan":"10.18.206.214","lan_ipv4":"10.18.206.214","wan":"10.18.206.214","wan_ipv4":"10.18.206.214"},"NodeMeta":{"consul-network-segment":"","consul-version":"1.19.1"},"ServiceKind":"","ServiceID":"nginx","ServiceName":"nginx","ServiceTags":["service"],"ServiceAddress":"10.15.7.72","ServiceTaggedAddresses":{"lan_ipv4":{"Address":"10.15.7.72","Port":9113},"wan_ipv4":{"Address":"10.15.7.72","Port":9113}},"ServiceWeights":{"Passing":1,"Warning":1},"ServiceMeta":{},"ServicePort":9113,"ServiceSocketPath":"","ServiceEnableTagOverride":false,"ServiceProxy":{"Mode":"","MeshGateway":{},"Expose":{}},"ServiceConnect":{},"ServiceLocality":null,"CreateIndex":586,"ModifyIndex":586}][root@aminglinux01 prometheus-nginx-exporter]# 

二、kube-state-metrics和metrics-server

1.Kube-state-metrics

1)介绍

Kube-state-metrics 是一个Kubernetes组件,它提供了一种将 Kubernetes 集群中各资源状态信息转化为可监控指标的方法,以帮助用户更好地理解和监控集群的健康状态和性能。
在 Kubernetes 集群中,有许多对象(例如 Pod、Deployment、Service 等)以及它们的状态信息(例如副本数、状态、标签等)。

Kube-state-metrics 通过监听 Kubernetes API 的变化,实时地获取这些对象的状态信息,并将其指标化。这些指标可以用于监控和告警,帮助运维人员了解集群中各个组件的健康状况、性能指标以及其他重要的状态信息。
Kube-state-metrics 生成的指标可以被 Prometheus 服务器采集,并用于构建仪表板、设置警报规则以及进行集群性能分析。通过将 kube-state-metrics 与 Prometheus 结合使用,您可以更好地了解 Kubernetes 集群的运行情况,并对其进行监控和管理。

2)部署

使用helm部署

helm install kube-state-metrics bitnami/kube-state-metrics

[root@aminglinux01 kube-state-metrics]# helm install kube-state-metrics .
NAME: kube-state-metrics
LAST DEPLOYED: Sat Aug  3 05:13:58 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: kube-state-metrics
CHART VERSION: 4.2.11
APP VERSION: 2.13.0

** Please be patient while the chart is being deployed **

Watch the kube-state-metrics Deployment status using the command:

    kubectl get deploy -w --namespace default kube-state-metrics

kube-state-metrics can be accessed via port "8080" on the following DNS name from within your cluster:

    kube-state-metrics.default.svc.cluster.local

To access kube-state-metrics from outside the cluster execute the following commands:

    echo "URL: http://127.0.0.1:9100/"
    kubectl port-forward --namespace default svc/kube-state-metrics 9100:8080

WARNING: There are "resources" sections in the chart not set. Using "resourcesPreset" is not recommended for production. For production installations, please set the following values according to your workload needs:
  - resources
+info https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

⚠ SECURITY WARNING: Original containers have been substituted. This Helm chart was designed, tested, and validated on multiple platforms using a specific set of Bitnami and Tanzu Application Catalog containers. Substituting other containers is likely to cause degraded security and performance, broken chart features, and missing environment variables.

Substituted images detected:
  - registry.cn-hangzhou.aliyuncs.com/daliyused/kube-state-metrics:2.13.0-debian-12-r2

3)配置Prometheus

先获取svc的ip

[root@aminglinux01 kube-state-metrics]# kubectl get svc | grep kub
kube-state-metrics           ClusterIP      10.15.215.19    <none>           8080/TCP                                                                           26s

测试:

curl 10.15.215.19:8080/metrics

[root@aminglinux01 kube-state-metrics]# curl 10.15.215.19:8080/metrics
# HELP kube_configmap_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_configmap_annotations gauge
# HELP kube_configmap_labels [STABLE] Kubernetes labels converted to Prometheus labels.
# TYPE kube_configmap_labels gauge
# HELP kube_configmap_info [STABLE] Information about configmap.
# TYPE kube_configmap_info gauge
kube_configmap_info{namespace="default",configmap="myharbor-redis-health"} 1
kube_configmap_info{namespace="default",configmap="myharbor-redis-configuration"} 1
kube_configmap_info{namespace="default",configmap="myharbor-jobservice-config"} 1
kube_configmap_info{namespace="default",configmap="myharbor-trivy-envvars"} 1
kube_configmap_info{namespace="kube-public",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="kube-system",configmap="calico-config"} 1
kube_configmap_info{namespace="metallb-system",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="myharbor-jobservice-envvars"} 1
kube_configmap_info{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-proxy"} 1
kube_configmap_info{namespace="kube-system",configmap="coredns"} 1
kube_configmap_info{namespace="kube-system",configmap="kubeadm-config"} 1
kube_configmap_info{namespace="kube-public",configmap="cluster-info"} 1
kube_configmap_info{namespace="default",configmap="prometheus-server"} 1
kube_configmap_info{namespace="default",configmap="myharbor-core-envvars"} 1
kube_configmap_info{namespace="prometheus",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="myharbor-core"} 1
kube_configmap_info{namespace="default",configmap="myharbor-portal"} 1
kube_configmap_info{namespace="default",configmap="myharbor-redis-scripts"} 1
kube_configmap_info{namespace="yeyunyi",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="myharbor-postgresql-init-scripts"} 1
kube_configmap_info{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="metallb-system",configmap="metallb-excludel2"} 1
kube_configmap_info{namespace="default",configmap="myharbor-nginx"} 1
kube_configmap_info{namespace="default",configmap="myharbor-postgresql-extended-configuration"} 1
kube_configmap_info{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_info{namespace="kube-system",configmap="kubelet-config"} 1
kube_configmap_info{namespace="default",configmap="myharbor-registry"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="prometheus-alertmanager"} 1
# HELP kube_configmap_created [STABLE] Unix creation timestamp
# TYPE kube_configmap_created gauge

使用cul注册到consul

curl -X PUT -d '{"id": "kube-state-metrics","name": "kube-state-metrics","address":"10.15.215.19","port": 8080,"tags": ["service"],"checks": [{"http":"http://10.15.215.19:8080/metrics/","interval": "5s"}]}' http://10.15.7.56/v1/agent/service/register

2.metrics-server

Metrics Server(指标服务器)是一个用于收集、聚合和存储 Kubernetes 集群中的指标数据的组件它是Kubernetes 的核心组件之一,用于提供对集群中运行的容器和节点的资源利用率和性能指标的实时监控
Metrics Server 通过与 Kubernetes API 交互,定期收集指标数据。这些指标包括 CPU 使用率、内存使用率、文件系统使用率、网络流量等。
下载yaml文件

wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.3/high-availability-1.21+.yaml

修改YAML文件

vi high-availability-1.21+.yaml
将image: k8s.gcr.io/metrics-server/metrics-server:v0.6.3 修改为 image:registry.cnhangzhou.
aliyuncs.com/google_containers/metrics-server:v0.6.3在image: 这行上面增加一行: - --kubelet-insecure-tls (这里注意缩进格式,要对齐,前面是空格,千万不要用tab)

应用此YAML文件

kubectl apply -f high-availability-1.21+.yaml

[root@aminglinux01 ~]# kubectl apply -f high-availability-1.21+.yaml 
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
poddisruptionbudget.policy/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

 查看pod

kubectl get po -n kube-system

[root@aminglinux01 ~]# kubectl get pod -n kube-system | grep metric
metrics-server-76467d945-9jtxx            1/1     Running   0                31s
metrics-server-76467d945-vnjxl            1/1     Running   0                31s

测试

[root@aminglinux01 ~]# kubectl top node
NAME           CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%   
aminglinux01   201m         5%     1984Mi          56%       
aminglinux02   261m         6%     2098Mi          59%       
aminglinux03   274m         6%     1885Mi          53%       
[root@aminglinux01 ~]# kubectl top po
NAME                                        CPU(cores)   MEMORY(bytes)   
kube-state-metrics-75778cdfff-2vdkl         1m           37Mi            
lucky-6cdcf8b9d4-t5r66                      1m           74Mi            
myharbor-core-b9d48ccdd-v9jdz               1m           68Mi            
myharbor-jobservice-6f5dbfcc4f-q852z        3m           29Mi            
myharbor-nginx-65b8c5764d-vz4vn             1m           12Mi            
myharbor-portal-ff7fd4949-lj6jw             1m           7Mi             
myharbor-postgresql-0                       14m          69Mi            
myharbor-redis-master-0                     28m          18Mi            
myharbor-registry-5b59458d9-4j79b           1m           30Mi            
myharbor-trivy-0                            1m           11Mi            
nginx                                       1m           33Mi            
ngnix                                       0m           24Mi            
node-exporter-9cn2c                         7m           20Mi            
node-exporter-h4ntw                         1m           17Mi            
node-exporter-wvp2h                         8m           21Mi            
pod-demo                                    1m           72Mi            
pod-demo1                                   0m           4Mi             
prometheus-alertmanager-0                   2m           24Mi            
prometheus-consul-0                         40m          55Mi            
prometheus-consul-1                         35m          47Mi            
prometheus-consul-2                         30m          48Mi            
prometheus-nginx-exporter-bbf5d8b8b-s8hvl   1m           15Mi            
prometheus-server-bd476698f-48z94           1m           106Mi           
redis-sts-0                                 2m           8Mi             
redis-sts-1                                 2m           8Mi             
[root@aminglinux01 ~]# 

三、Prometheus监控Kubernetes集群

1.监控集群节点

前面我们已经使用过node_exporter实现了,但这里再给大家介绍另外一种方式,基于Kubernetes的服务发现机制

      - job_name: 'kubernetes-nodes'
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        kubernetes_sd_configs:
        - role: node
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
        - target_label: __address__
          replacement: kubernetes.default.svc:443
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          target_label: __metrics_path__
          replacement: /api/v1/nodes/${1}/proxy/metrics

部署:

kubectl  delete cm prometheus-server ; kubectl apply -f prometheus_config.yaml

kubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}

 

2.监控apiserver

      - job_name: 'kubernetes-apiservers'
        kubernetes_sd_configs:
        - role: endpoints
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name,__meta_kubernetes_endpoint_port_name]
          action: keep
          regex: default;kubernetes;https

 

3.监控bkubelete

      - job_name: 'kubernetes-kubelet'
        kubernetes_sd_configs:
        - role: node
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          insecure_skip_verify: true
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)

部署

[root@aminglinux01 ~]# kubectl delete cm prometheus-server ;kubectl apply -f prometheus_config.yaml 
configmap "prometheus-server" deleted
configmap/prometheus-server created

kubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}

 

四、Kubernetes 常用资源对象监控

1.容器监控

      - job_name: 'kubernetes-cadvisor'
        kubernetes_sd_configs:
        - role: node
        scheme: https
        tls_config:
          ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
        relabel_configs:
        - action: labelmap
          regex: __meta_kubernetes_node_label_(.+)
          replacement: $1
        - source_labels: [__meta_kubernetes_node_name]
          regex: (.+)
          replacement: /metrics/cadvisor # <nodeip>/metrics -><nodeip>/metrics/cadvisor
          target_label: __metrics_path__

[root@aminglinux01 prometheus]# kubectl delete cm prometheus-server; kubectl apply -f prometheus_config.yaml
configmap "prometheus-server" deleted
configmap/prometheus-server created
[root@aminglinux01 prometheus]# kubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}
pod "prometheus-server-5df77fb7d7-rdj59" deleted 

2.service监控

      - job_name: 'kubernetes-service-endpoints'
        kubernetes_sd_configs:
        - role: endpoints
        relabel_configs:
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
          action: keep
          regex: true
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
          action: replace
          target_label: __scheme__
          regex: (https?)
        - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
          action: replace
          target_label: __metrics_path__
          regex: (.+)
        - source_labels: [__address__,__meta_kubernetes_service_annotation_prometheus_io_port]
          action: replace
          target_label: __address__
          regex: ([^:]+)(?::\d+)?;(\d+)
          replacement: $1:$2
        - action: labelmap
          regex: __meta_kubernetes_service_label_(.+)
        - source_labels: [__meta_kubernetes_namespace]
          action: replace
          target_label: kubernetes_namespace
        - source_labels: [__meta_kubernetes_service_name]
          action: replace
          target_label: kubernetes_name
        - source_labels: [__meta_kubernetes_pod_name]
          action: replace
          target_label: kubernetes_pod_name

[root@aminglinux01 prometheus]# kubectl delete cm prometheus-server; kubectl apply -f prometheus_config.yaml
configmap "prometheus-server" deleted
configmap/prometheus-server created
[root@aminglinux01 prometheus]# kubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}
pod "prometheus-server-5df77fb7d7-gdm7p" deleted
[root@aminglinux01 prometheus]# kubectl get pod -owide| grep prometheus
prometheus-alertmanager-0                   1/1     Running   0              18m     10.18.206.200     aminglinux02   <none>           <none>
prometheus-consul-0                         1/1     Running   0              2d23h   10.18.206.214     aminglinux02   <none>           <none>
prometheus-consul-1                         1/1     Running   0              2d23h   10.18.68.141      aminglinux03   <none>           <none>
prometheus-consul-2                         1/1     Running   0              2d23h   10.18.206.209     aminglinux02   <none>           <none>
prometheus-nginx-exporter-bbf5d8b8b-s8hvl   1/1     Running   0              2d22h   10.18.206.252     aminglinux02   <none>           <none>
prometheus-server-5df77fb7d7-jdd4t          0/1     Running   0              4s      10.18.68.186      aminglinux03   <none>           <none>
[root@aminglinux01 prometheus]# 

 

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐