Prometheus自动发现机制、利用Consul监控Nginx、kube-state-metrics和met、Prometheus监控Kubernetes集群及Kubernetes 常用资源对象监控
如果被监控目标基于k8s,那么被监控目标将会非常多,而且目标对象更改频率也非常高,这就导致添加监控目标非常繁琐。Prometheus主要分为, 常用的为以下几类:#静态服务发现,即将配置直接写到配置文件里或者Configmap里#文件服务发现,创建一个专门配置target的配置文件,新增监控对象时直接修改那个专门的文件即可DNS #服务发现#Kubernetes 服务发现# Consul 服务发现
一、Prometheus自动发现机制
如果被监控目标基于k8s,那么被监控目标将会非常多,而且目标对象更改频率也非常高,这就导致添加监控目标非常繁琐。引入服务发现机制就是为了实现自动将被监控目标添加到Prometheus里。
Prometheus数据源的配置主要分为静态配置和动态发现, 常用的为以下几类:
static_configs: #静态服务发现,即将配置直接写到配置文件里或者Configmap里
file_sd_configs: #文件服务发现,创建一个专门配置target的配置文件,新增监控对象时直接修改那个专门的文件即可
dns_sd_configs: DNS #服务发现
kubernetes_sd_configs: #Kubernetes 服务发现
consul_sd_configs: # Consul 服务发现
在监控kubernetes的应用场景中,频繁更新的pod,svc,等等资源配置应该是最能体Prometheus监控目标自动发现服务的好处。下面我们使用consul的形式来实现自动发现:
1.在k8s里起一个consul服务
使用helm安装,由于consul涉及到了数据持久化,需要先将包下载下来,并修改values.yaml
helm pull bitnami/consul --untar
声明:假设你已经配制好数据持久化,我这里使用我早期配置的NFS的StorageClass(nfs-client)
修改values.yaml
cd consul
vi values.yaml #搜索storageClass,改为
storageClass: "nfs-client" ##有几个就改几个
helm安装
helm install prometheus-consul .
[root@aminglinux01 consul]# helm install prometheus-consul .
NAME: prometheus-consul
LAST DEPLOYED: Sat Aug 3 03:42:41 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: consul
CHART VERSION: 11.3.11
APP VERSION: 1.19.1
** Please be patient while the chart is being deployed **
Consul can be accessed within the cluster on port 8300 at prometheus-consul-headless.default.svc.cluster.local
In order to access to the Consul Web UI:
1. Get the Consul URL by running these commands:
kubectl port-forward --namespace default svc/prometheus-consul-ui 80:80
echo "Consul URL: http://127.0.0.1:80"
2. Access ASP.NET Core using the obtained URL.
Please take into account that you need to wait until a cluster leader is elected before using the Consul Web UI.
In order to check the status of the cluster you can run the following command:
kubectl exec -it prometheus-consul-0 -- consul members
Furthermore, to know which Consul node is the cluster leader run this other command:
kubectl exec -it prometheus-consul-0 -- consul operator raft list-peers
WARNING: There are "resources" sections in the chart not set. Using "resourcesPreset" is not recommended for production. For production installations, please set the following values according to your workload needs:
- resources
+info https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
⚠ SECURITY WARNING: Original containers have been substituted. This Helm chart was designed, tested, and validated on multiple platforms using a specific set of Bitnami and Tanzu Application Catalog containers. Substituting other containers is likely to cause degraded security and performance, broken chart features, and missing environment variables.
Substituted images detected:
- registry.cn-hangzhou.aliyuncs.com/daliyused/consul:1.19.1-debian-12-r4
- registry.cn-hangzhou.aliyuncs.com/daliyused/consul-exporter:0.12.0-debian-12-r10
[root@aminglinux01 consul]#
查看pvc
kubectl get pvc
[root@aminglinux01 consul]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
data-prometheus-consul-0 Bound pvc-092f63a4-59cc-4200-b511-fb7d5a288b42 8Gi RWO nfs-client 46s
data-prometheus-consul-1 Bound pvc-b9f7b89a-a4e9-433b-b03b-32d574145dcf 8Gi RWO nfs-client 46s
data-prometheus-consul-2 Bound pvc-d8a278ec-956d-4f39-bce2-19ff886e8209 8Gi RWO nfs-client 46s
查看pod
[root@aminglinux01 consul]# kubectl get pod | grep consul
prometheus-consul-0 1/1 Running 0 42s
prometheus-consul-1 1/1 Running 0 42s
prometheus-consul-2 1/1 Running 0 42s
[root@aminglinux01 consul]#
查看consul的service
[root@aminglinux01 consul]# kubectl get svc | grep consul
prometheus-consul-headless ClusterIP None <none> 8500/TCP,8400/TCP,8301/TCP,8302/UDP,8302/TCP,8301/UDP,8300/TCP,8600/TCP,8600/UDP 4m53s
prometheus-consul-ui ClusterIP 10.15.7.56 <none> 80/TCP 4m53s
[root@aminglinux01 consul]#
通过consul接口注册数据
curl -X PUT -d '{"id": "aminglinux03","name": "aminglinux03","address":"192.168.100.153","port": 9100,"tags": ["service"],"checks": [{"http":"http://192.168.100.153:9100/","interval": "5s"}]}' http://10.15.7.56/v1/agent/service/register
查看:
curl 10.15.7.56/v1/catalog/service/aminglinux03
2.将consul配置到Prometheus的配置里
还得去编辑prometheus_config.yaml
vi prometheus_config.yaml ###在 scrape_configs: 下面增加,如下
- job_name: 'consul'
consul_sd_configs:
- server: 'prometheus-consul-ui' ##由于服务重启时svcIP会变,因而直接使用svc名称
重新导入配置
kubectl delete -f prometheus_config.yaml
kubectl apply -f prometheus_config.yaml
[root@aminglinux01 prometheus]# vim prometheus_config.yaml
[root@aminglinux01 prometheus]# kubectl delete -f prometheus_config.yaml
configmap "prometheus-server" deleted
[root@aminglinux01 prometheus]# kubectl apply -f prometheus_config.yaml
configmap/prometheus-server created
[root@aminglinux01 prometheus]#
重启Prometheus服务
[root@aminglinux01 prometheus]# kubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}
pod "prometheus-server-bd476698f-vbkjd" deleted
3.利用consul,监控Nginx
用已经存在的nginx pod作为示例,需要先到pod里面去配置一下status页:
kubectl exec -it nginx -- bash
由于nginx的镜像内并没有vi工具,所以还需要额外安装个vim(在pod内操作)
apt update
apt install -y vim
编辑配置文件
vim /etc/nginx/conf.d/default.conf ##改为如下,增加到最后面的 } 上面即可
location /nginx_status {
stub_status;
}
重新加载(在pod内操作)
nginx -s reload
root@nginx:/# nginx -s reload
2024/08/02 20:31:33 [notice] 273#273: signal process started
root@nginx:/#
测试是否生效(在pod内操作)
curl localhost/nginx_status ##显示如下内容,说明成功
Active connections: 1
server accepts handled requests
2 2 2
Reading: 0 Writing: 1 Waiting: 0
root@nginx:/# curl localhost/nginx_status
Active connections: 1
server accepts handled requests
1 1 1
Reading: 0 Writing: 1 Waiting: 0
root@nginx:/#
说明:没问题,就可以退出pod了
有了被监控目标,还需要安装一个nginx的exporter
首先用helm下载包(因为需要修改配置,不能直接安装)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
helm pull prometheus-community/prometheus-nginx-exporter --untar
[root@aminglinux01 ~]# helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
"prometheus-community" already exists with the same configuration, skipping
[root@aminglinux01 ~]# helm repo update
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "aliyun" chart repository
...Successfully got an update from the "prometheus-community" chart repository
...Successfully got an update from the "bitnami" chart repository
...Successfully got an update from the "helm_sh" chart repository
Update Complete. ⎈Happy Helming!⎈
[root@aminglinux01 ~]# helm pull prometheus-community/prometheus-nginx-exporter --untar
Error: Get "https://github.com/prometheus-community/helm-charts/releases/download/prometheus-nginx-exporter-0.2.1/prometheus-nginx-exporter-0.2.1.tgz": unexpected EOF
[root@aminglinux01 ~]#
获取目标pod的ip
[root@aminglinux01 ~]# kubectl get po -o wide | grep nginx
nginx 1/1 Running 0 9m36s 10.18.68.136 aminglinux03 <none> <none>
[root@aminglinux01 ~]#
编辑values.yaml
vi values.yaml
将 nginxServer: "http://{{ .Release.Name }}.{{ .Release.Namespace
}}.svc.cluster.local:8080/stub_status" 这行注释,然后再增加一行
nginxServer: "http://10.18.68.136/nginx_status"
安装nginx-exporter
helm install prometheus-nginx-exporter .
[root@aminglinux01 prometheus-nginx-exporter]# helm install prometheus-nginx-exporter .
NAME: prometheus-nginx-exporter
LAST DEPLOYED: Sat Aug 3 04:50:28 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
1. Get the application URL by running these commands:
export POD_NAME=$(kubectl get pods --namespace default -l "app.kubernetes.io/name=prometheus-nginx-exporter,app.kubernetes.io/instance=prometheus-nginx-exporter" -o jsonpath="{.items[0].metadata.name}")
export CONTAINER_PORT=$(kubectl get pod --namespace default $POD_NAME -o jsonpath="{.spec.containers[0].ports[0].containerPort}")
echo "Visit http://127.0.0.1:8080 to use your application"
kubectl --namespace default port-forward $POD_NAME 8080:$CONTAINER_PORT
[root@aminglinux01 prometheus-nginx-exporter]# kubectl get po | grep prometheus-
查看pod状态
[root@aminglinux01 prometheus-nginx-exporter]# kubectl get po | grep prometheus-
prometheus-alertmanager-0 1/1 Running 0 173m
prometheus-consul-0 1/1 Running 0 69m
prometheus-consul-1 1/1 Running 0 69m
prometheus-consul-2 1/1 Running 0 69m
prometheus-nginx-exporter-bbf5d8b8b-s8hvl 1/1 Running 0 2m1s
prometheus-server-bd476698f-48z94 1/1 Running 0 49m
[root@aminglinux01 prometheus-nginx-exporter]#
查看svc ip
[root@aminglinux01 prometheus-nginx-exporter]# kubectl get svc | grep prometheus-
prometheus-alertmanager LoadBalancer 10.15.190.141 192.168.10.244 80:31633/TCP 173m
prometheus-consul-headless ClusterIP None <none> 8500/TCP,8400/TCP,8301/TCP,8302/UDP,8302/TCP,8301/UDP,8300/TCP,8600/TCP,8600/UDP 70m
prometheus-consul-ui ClusterIP 10.15.7.56 <none> 80/TCP 70m
prometheus-nginx-exporter ClusterIP 10.15.7.72 <none> 9113/TCP 2m30s
prometheus-server LoadBalancer 10.15.67.163 192.168.10.243 80:32470/TCP 173m
[root@aminglinux01 prometheus-nginx-exporter]#
访问
[root@aminglinux01 prometheus-nginx-exporter]# curl 10.15.7.72:9113/metrics
# HELP go_gc_duration_seconds A summary of the pause duration of garbage collection cycles.
# TYPE go_gc_duration_seconds summary
go_gc_duration_seconds{quantile="0"} 0
go_gc_duration_seconds{quantile="0.25"} 0
go_gc_duration_seconds{quantile="0.5"} 0
go_gc_duration_seconds{quantile="0.75"} 0
go_gc_duration_seconds{quantile="1"} 0
go_gc_duration_seconds_sum 0
go_gc_duration_seconds_count 0
# HELP go_goroutines Number of goroutines that currently exist.
# TYPE go_goroutines gauge
go_goroutines 12
# HELP go_info Information about the Go environment.
# TYPE go_info gauge
go_info{version="go1.22.5"} 1
# HELP go_memstats_alloc_bytes Number of bytes allocated and still in use.
# TYPE go_memstats_alloc_bytes gauge
go_memstats_alloc_bytes 250176
# HELP go_memstats_alloc_bytes_total Total number of bytes allocated, even if freed.
# TYPE go_memstats_alloc_bytes_total counter
go_memstats_alloc_bytes_total 250176
# HELP go_memstats_buck_hash_sys_bytes Number of bytes used by the profiling bucket hash table.
# TYPE go_memstats_buck_hash_sys_bytes gauge
go_memstats_buck_hash_sys_bytes 8154
# HELP go_memstats_frees_total Total number of frees.
# TYPE go_memstats_frees_total counter
go_memstats_frees_total 0
# HELP go_memstats_gc_sys_bytes Number of bytes used for garbage collection system metadata.
# TYPE go_memstats_gc_sys_bytes gauge
go_memstats_gc_sys_bytes 1.418824e+06
# HELP go_memstats_heap_alloc_bytes Number of heap bytes allocated and still in use.
# TYPE go_memstats_heap_alloc_bytes gauge
go_memstats_heap_alloc_bytes 250176
# HELP go_memstats_heap_idle_bytes Number of heap bytes waiting to be used.
# TYPE go_memstats_heap_idle_bytes gauge
go_memstats_heap_idle_bytes 1.417216e+06
# HELP go_memstats_heap_inuse_bytes Number of heap bytes that are in use.
# TYPE go_memstats_heap_inuse_bytes gauge
go_memstats_heap_inuse_bytes 2.08896e+06
# HELP go_memstats_heap_objects Number of allocated objects.
# TYPE go_memstats_heap_objects gauge
go_memstats_heap_objects 1057
# HELP go_memstats_heap_released_bytes Number of heap bytes released to OS.
# TYPE go_memstats_heap_released_bytes gauge
go_memstats_heap_released_bytes 1.417216e+06
# HELP go_memstats_heap_sys_bytes Number of heap bytes obtained from system.
# TYPE go_memstats_heap_sys_bytes gauge
go_memstats_heap_sys_bytes 3.506176e+06
# HELP go_memstats_last_gc_time_seconds Number of seconds since 1970 of last garbage collection.
# TYPE go_memstats_last_gc_time_seconds gauge
go_memstats_last_gc_time_seconds 0
# HELP go_memstats_lookups_total Total number of pointer lookups.
# TYPE go_memstats_lookups_total counter
go_memstats_lookups_total 0
# HELP go_memstats_mallocs_total Total number of mallocs.
# TYPE go_memstats_mallocs_total counter
go_memstats_mallocs_total 1057
# HELP go_memstats_mcache_inuse_bytes Number of bytes in use by mcache structures.
# TYPE go_memstats_mcache_inuse_bytes gauge
go_memstats_mcache_inuse_bytes 4800
# HELP go_memstats_mcache_sys_bytes Number of bytes used for mcache structures obtained from system.
# TYPE go_memstats_mcache_sys_bytes gauge
go_memstats_mcache_sys_bytes 15600
# HELP go_memstats_mspan_inuse_bytes Number of bytes in use by mspan structures.
# TYPE go_memstats_mspan_inuse_bytes gauge
go_memstats_mspan_inuse_bytes 54080
# HELP go_memstats_mspan_sys_bytes Number of bytes used for mspan structures obtained from system.
# TYPE go_memstats_mspan_sys_bytes gauge
go_memstats_mspan_sys_bytes 65280
# HELP go_memstats_next_gc_bytes Number of heap bytes when next garbage collection will take place.
# TYPE go_memstats_next_gc_bytes gauge
go_memstats_next_gc_bytes 4.194304e+06
# HELP go_memstats_other_sys_bytes Number of bytes used for other system allocations.
# TYPE go_memstats_other_sys_bytes gauge
go_memstats_other_sys_bytes 1.202686e+06
# HELP go_memstats_stack_inuse_bytes Number of bytes in use by the stack allocator.
# TYPE go_memstats_stack_inuse_bytes gauge
go_memstats_stack_inuse_bytes 688128
# HELP go_memstats_stack_sys_bytes Number of bytes obtained from system for stack allocator.
# TYPE go_memstats_stack_sys_bytes gauge
go_memstats_stack_sys_bytes 688128
# HELP go_memstats_sys_bytes Number of bytes obtained from system.
# TYPE go_memstats_sys_bytes gauge
go_memstats_sys_bytes 6.904848e+06
# HELP go_threads Number of OS threads created.
# TYPE go_threads gauge
go_threads 9
# HELP nginx_connections_accepted Accepted client connections
# TYPE nginx_connections_accepted counter
nginx_connections_accepted 2
# HELP nginx_connections_active Active client connections
# TYPE nginx_connections_active gauge
nginx_connections_active 1
# HELP nginx_connections_handled Handled client connections
# TYPE nginx_connections_handled counter
nginx_connections_handled 2
# HELP nginx_connections_reading Connections where NGINX is reading the request header
# TYPE nginx_connections_reading gauge
nginx_connections_reading 0
# HELP nginx_connections_waiting Idle client connections
# TYPE nginx_connections_waiting gauge
nginx_connections_waiting 0
# HELP nginx_connections_writing Connections where NGINX is writing the response back to the client
# TYPE nginx_connections_writing gauge
nginx_connections_writing 1
# HELP nginx_exporter_build_info A metric with a constant '1' value labeled by version, revision, branch, goversion from which nginx_exporter was built, and the goos and goarch for the build.
# TYPE nginx_exporter_build_info gauge
nginx_exporter_build_info{branch="HEAD",goarch="amd64",goos="linux",goversion="go1.22.5",revision="9522f4e39ee1aed817d7d70a89514ccc0ae1594a",tags="unknown",version="1.3.0"} 1
将该metrics地址注册到consul里
curl -X PUT -d '{"id": "nginx","name": "nginx","address":"10.15.7.72","port": 9113,"tags": ["service"],"checks": [{"http":"http://10.15.7.72:9113/","interval": "5s"}]}' http://10.15.7.56/v1/agent/service/register
查看
curl 10.15.7.56/v1/catalog/service/nginx
[root@aminglinux01 prometheus-nginx-exporter]# curl -X PUT -d '{"id": "nginx","name": "nginx","address":"10.15.7.72","port": 9113,"tags": ["service"],"checks": [{"http":"http://10.15.7.72:9113/","interval": "5s"}]}' http://10.15.7.56/v1/agent/service/register
[root@aminglinux01 prometheus-nginx-exporter]# curl 10.15.7.56/v1/catalog/service/nginx
[{"ID":"e0cf3632-3441-37ae-3b07-313aaa22f649","Node":"prometheus-consul-0","Address":"10.18.206.214","Datacenter":"dc1","TaggedAddresses":{"lan":"10.18.206.214","lan_ipv4":"10.18.206.214","wan":"10.18.206.214","wan_ipv4":"10.18.206.214"},"NodeMeta":{"consul-network-segment":"","consul-version":"1.19.1"},"ServiceKind":"","ServiceID":"nginx","ServiceName":"nginx","ServiceTags":["service"],"ServiceAddress":"10.15.7.72","ServiceTaggedAddresses":{"lan_ipv4":{"Address":"10.15.7.72","Port":9113},"wan_ipv4":{"Address":"10.15.7.72","Port":9113}},"ServiceWeights":{"Passing":1,"Warning":1},"ServiceMeta":{},"ServicePort":9113,"ServiceSocketPath":"","ServiceEnableTagOverride":false,"ServiceProxy":{"Mode":"","MeshGateway":{},"Expose":{}},"ServiceConnect":{},"ServiceLocality":null,"CreateIndex":586,"ModifyIndex":586}][root@aminglinux01 prometheus-nginx-exporter]#
二、kube-state-metrics和metrics-server
1.Kube-state-metrics
1)介绍
Kube-state-metrics 是一个Kubernetes组件,它提供了一种将 Kubernetes 集群中各资源状态信息转化为可监控指标的方法,以帮助用户更好地理解和监控集群的健康状态和性能。
在 Kubernetes 集群中,有许多对象(例如 Pod、Deployment、Service 等)以及它们的状态信息(例如副本数、状态、标签等)。
Kube-state-metrics 通过监听 Kubernetes API 的变化,实时地获取这些对象的状态信息,并将其指标化。这些指标可以用于监控和告警,帮助运维人员了解集群中各个组件的健康状况、性能指标以及其他重要的状态信息。
Kube-state-metrics 生成的指标可以被 Prometheus 服务器采集,并用于构建仪表板、设置警报规则以及进行集群性能分析。通过将 kube-state-metrics 与 Prometheus 结合使用,您可以更好地了解 Kubernetes 集群的运行情况,并对其进行监控和管理。
2)部署
使用helm部署
helm install kube-state-metrics bitnami/kube-state-metrics
[root@aminglinux01 kube-state-metrics]# helm install kube-state-metrics .
NAME: kube-state-metrics
LAST DEPLOYED: Sat Aug 3 05:13:58 2024
NAMESPACE: default
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: kube-state-metrics
CHART VERSION: 4.2.11
APP VERSION: 2.13.0
** Please be patient while the chart is being deployed **
Watch the kube-state-metrics Deployment status using the command:
kubectl get deploy -w --namespace default kube-state-metrics
kube-state-metrics can be accessed via port "8080" on the following DNS name from within your cluster:
kube-state-metrics.default.svc.cluster.local
To access kube-state-metrics from outside the cluster execute the following commands:
echo "URL: http://127.0.0.1:9100/"
kubectl port-forward --namespace default svc/kube-state-metrics 9100:8080
WARNING: There are "resources" sections in the chart not set. Using "resourcesPreset" is not recommended for production. For production installations, please set the following values according to your workload needs:
- resources
+info https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
⚠ SECURITY WARNING: Original containers have been substituted. This Helm chart was designed, tested, and validated on multiple platforms using a specific set of Bitnami and Tanzu Application Catalog containers. Substituting other containers is likely to cause degraded security and performance, broken chart features, and missing environment variables.
Substituted images detected:
- registry.cn-hangzhou.aliyuncs.com/daliyused/kube-state-metrics:2.13.0-debian-12-r2
3)配置Prometheus
先获取svc的ip
[root@aminglinux01 kube-state-metrics]# kubectl get svc | grep kub
kube-state-metrics ClusterIP 10.15.215.19 <none> 8080/TCP 26s
测试:
curl 10.15.215.19:8080/metrics
[root@aminglinux01 kube-state-metrics]# curl 10.15.215.19:8080/metrics
# HELP kube_configmap_annotations Kubernetes annotations converted to Prometheus labels.
# TYPE kube_configmap_annotations gauge
# HELP kube_configmap_labels [STABLE] Kubernetes labels converted to Prometheus labels.
# TYPE kube_configmap_labels gauge
# HELP kube_configmap_info [STABLE] Information about configmap.
# TYPE kube_configmap_info gauge
kube_configmap_info{namespace="default",configmap="myharbor-redis-health"} 1
kube_configmap_info{namespace="default",configmap="myharbor-redis-configuration"} 1
kube_configmap_info{namespace="default",configmap="myharbor-jobservice-config"} 1
kube_configmap_info{namespace="default",configmap="myharbor-trivy-envvars"} 1
kube_configmap_info{namespace="kube-public",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="kube-system",configmap="calico-config"} 1
kube_configmap_info{namespace="metallb-system",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="myharbor-jobservice-envvars"} 1
kube_configmap_info{namespace="kube-node-lease",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-proxy"} 1
kube_configmap_info{namespace="kube-system",configmap="coredns"} 1
kube_configmap_info{namespace="kube-system",configmap="kubeadm-config"} 1
kube_configmap_info{namespace="kube-public",configmap="cluster-info"} 1
kube_configmap_info{namespace="default",configmap="prometheus-server"} 1
kube_configmap_info{namespace="default",configmap="myharbor-core-envvars"} 1
kube_configmap_info{namespace="prometheus",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="myharbor-core"} 1
kube_configmap_info{namespace="default",configmap="myharbor-portal"} 1
kube_configmap_info{namespace="default",configmap="myharbor-redis-scripts"} 1
kube_configmap_info{namespace="yeyunyi",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="myharbor-postgresql-init-scripts"} 1
kube_configmap_info{namespace="default",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="metallb-system",configmap="metallb-excludel2"} 1
kube_configmap_info{namespace="default",configmap="myharbor-nginx"} 1
kube_configmap_info{namespace="default",configmap="myharbor-postgresql-extended-configuration"} 1
kube_configmap_info{namespace="kube-system",configmap="extension-apiserver-authentication"} 1
kube_configmap_info{namespace="kube-system",configmap="kubelet-config"} 1
kube_configmap_info{namespace="default",configmap="myharbor-registry"} 1
kube_configmap_info{namespace="kube-system",configmap="kube-root-ca.crt"} 1
kube_configmap_info{namespace="default",configmap="prometheus-alertmanager"} 1
# HELP kube_configmap_created [STABLE] Unix creation timestamp
# TYPE kube_configmap_created gauge
使用cul注册到consul
curl -X PUT -d '{"id": "kube-state-metrics","name": "kube-state-metrics","address":"10.15.215.19","port": 8080,"tags": ["service"],"checks": [{"http":"http://10.15.215.19:8080/metrics/","interval": "5s"}]}' http://10.15.7.56/v1/agent/service/register
2.metrics-server
Metrics Server(指标服务器)是一个用于收集、聚合和存储 Kubernetes 集群中的指标数据的组件。它是Kubernetes 的核心组件之一,用于提供对集群中运行的容器和节点的资源利用率和性能指标的实时监控。
Metrics Server 通过与 Kubernetes API 交互,定期收集指标数据。这些指标包括 CPU 使用率、内存使用率、文件系统使用率、网络流量等。
下载yaml文件
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.6.3/high-availability-1.21+.yaml
修改YAML文件
vi high-availability-1.21+.yaml
将image: k8s.gcr.io/metrics-server/metrics-server:v0.6.3 修改为 image:registry.cnhangzhou.
aliyuncs.com/google_containers/metrics-server:v0.6.3在image: 这行上面增加一行: - --kubelet-insecure-tls (这里注意缩进格式,要对齐,前面是空格,千万不要用tab)
应用此YAML文件
kubectl apply -f high-availability-1.21+.yaml
[root@aminglinux01 ~]# kubectl apply -f high-availability-1.21+.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
poddisruptionbudget.policy/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
查看pod
kubectl get po -n kube-system
[root@aminglinux01 ~]# kubectl get pod -n kube-system | grep metric
metrics-server-76467d945-9jtxx 1/1 Running 0 31s
metrics-server-76467d945-vnjxl 1/1 Running 0 31s
测试
[root@aminglinux01 ~]# kubectl top node
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
aminglinux01 201m 5% 1984Mi 56%
aminglinux02 261m 6% 2098Mi 59%
aminglinux03 274m 6% 1885Mi 53%
[root@aminglinux01 ~]# kubectl top po
NAME CPU(cores) MEMORY(bytes)
kube-state-metrics-75778cdfff-2vdkl 1m 37Mi
lucky-6cdcf8b9d4-t5r66 1m 74Mi
myharbor-core-b9d48ccdd-v9jdz 1m 68Mi
myharbor-jobservice-6f5dbfcc4f-q852z 3m 29Mi
myharbor-nginx-65b8c5764d-vz4vn 1m 12Mi
myharbor-portal-ff7fd4949-lj6jw 1m 7Mi
myharbor-postgresql-0 14m 69Mi
myharbor-redis-master-0 28m 18Mi
myharbor-registry-5b59458d9-4j79b 1m 30Mi
myharbor-trivy-0 1m 11Mi
nginx 1m 33Mi
ngnix 0m 24Mi
node-exporter-9cn2c 7m 20Mi
node-exporter-h4ntw 1m 17Mi
node-exporter-wvp2h 8m 21Mi
pod-demo 1m 72Mi
pod-demo1 0m 4Mi
prometheus-alertmanager-0 2m 24Mi
prometheus-consul-0 40m 55Mi
prometheus-consul-1 35m 47Mi
prometheus-consul-2 30m 48Mi
prometheus-nginx-exporter-bbf5d8b8b-s8hvl 1m 15Mi
prometheus-server-bd476698f-48z94 1m 106Mi
redis-sts-0 2m 8Mi
redis-sts-1 2m 8Mi
[root@aminglinux01 ~]#
三、Prometheus监控Kubernetes集群
1.监控集群节点
前面我们已经使用过node_exporter实现了,但这里再给大家介绍另外一种方式,基于Kubernetes的服务发现机制
- job_name: 'kubernetes-nodes'
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
kubernetes_sd_configs:
- role: node
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics
部署:
kubectl delete cm prometheus-server ; kubectl apply -f prometheus_config.yaml
kubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}
2.监控apiserver
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name,__meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
3.监控bkubelete
- job_name: 'kubernetes-kubelet'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
部署
[root@aminglinux01 ~]# kubectl delete cm prometheus-server ;kubectl apply -f prometheus_config.yaml
configmap "prometheus-server" deleted
configmap/prometheus-server createdkubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}
四、Kubernetes 常用资源对象监控
1.容器监控
- job_name: 'kubernetes-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
replacement: $1
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
replacement: /metrics/cadvisor # <nodeip>/metrics -><nodeip>/metrics/cadvisor
target_label: __metrics_path__
[root@aminglinux01 prometheus]# kubectl delete cm prometheus-server; kubectl apply -f prometheus_config.yaml
configmap "prometheus-server" deleted
configmap/prometheus-server created
[root@aminglinux01 prometheus]# kubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}
pod "prometheus-server-5df77fb7d7-rdj59" deleted
2.service监控
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__,__meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
[root@aminglinux01 prometheus]# kubectl delete cm prometheus-server; kubectl apply -f prometheus_config.yaml
configmap "prometheus-server" deleted
configmap/prometheus-server created
[root@aminglinux01 prometheus]# kubectl get po |grep prometheus-server |awk '{print $1}' |xargs -i kubectl delete po {}
pod "prometheus-server-5df77fb7d7-gdm7p" deleted
[root@aminglinux01 prometheus]# kubectl get pod -owide| grep prometheus
prometheus-alertmanager-0 1/1 Running 0 18m 10.18.206.200 aminglinux02 <none> <none>
prometheus-consul-0 1/1 Running 0 2d23h 10.18.206.214 aminglinux02 <none> <none>
prometheus-consul-1 1/1 Running 0 2d23h 10.18.68.141 aminglinux03 <none> <none>
prometheus-consul-2 1/1 Running 0 2d23h 10.18.206.209 aminglinux02 <none> <none>
prometheus-nginx-exporter-bbf5d8b8b-s8hvl 1/1 Running 0 2d22h 10.18.206.252 aminglinux02 <none> <none>
prometheus-server-5df77fb7d7-jdd4t 0/1 Running 0 4s 10.18.68.186 aminglinux03 <none> <none>
[root@aminglinux01 prometheus]#
更多推荐
所有评论(0)