简介

prometheus-operator
Prometheus:一个非常优秀的监控工具或者说是监控方案。它提供了数据搜集、存储、处理、可视化和告警一套完整的解决方案。作为kubernetes官方推荐的监控系统,用Prometheus来监控kubernetes集群的状况和运行在集群上的应用运行状况。

Prometheus架构图

75eef7b40674e718835eaa2dfa313375.png

那Prometheus Operator是做什么的呢?
Operator是由CoreOS公司开发的,用来扩展 Kubernetes API,特定的应用程序控制器,它用来创建、配置和管理复杂的有状态应用,如数据库、缓存和监控系统。
可以理解为,Prometheus Operator就是用于管理部署Prometheus到kubernetes的工具,其目的是简化和自动化对Prometheus组件的维护。

Prometheus Operator架构

25082266c0ec4f99052ca96a54f9a6c9.png

部署前准备

1、克隆kube-prometheus项目

[root@k8s-master001 opt]# git clone https://github.com/prometheus-operator/kube-prometheus.git

2、进入kube-prometheus/manifests目录,可以看到一堆yaml文件,文件太多,我们按用组件分类

[root@k8s-master001 manifests]# ls -altotal 20drwxr-xr-x. 10 root root  140 Sep 14 21:25 .drwxr-xr-x. 12 root root 4096 Sep 14 21:11 ..drwxr-xr-x.  2 root root 4096 Sep 14 21:23 adapterdrwxr-xr-x.  2 root root  189 Sep 14 21:22 alertmanagerdrwxr-xr-x.  2 root root  241 Sep 14 21:22 exporterdrwxr-xr-x.  2 root root  254 Sep 14 21:23 grafanadrwxr-xr-x.  2 root root  272 Sep 14 21:22 metricsdrwxr-xr-x.  2 root root 4096 Sep 14 21:25 prometheusdrwxr-xr-x.  2 root root 4096 Sep 14 21:23 serviceMonitordrwxr-xr-x.  2 root root 4096 Sep 14 21:11 setup

3、修改yaml文件中的nodeSelector
首先查看下现在Node节点的标签

[root@k8s-master001 manifests]# kubectl get node --show-labels=trueNAME            STATUS   ROLES    AGE     VERSION   LABELSk8s-master001   Ready    master   4d16h   v1.19.0   app.storage=rook-ceph,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master001,kubernetes.io/os=linux,node-role.kubernetes.io/master=k8s-master002   Ready    master   4d16h   v1.19.0   app.storage=rook-ceph,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master002,kubernetes.io/os=linux,node-role.kubernetes.io/master=k8s-master003   Ready    master   4d16h   v1.19.0   app.storage=rook-ceph,beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s-master003,kubernetes.io/os=linux,node-role.kubernetes.io/master=,role=ingress-controller

并把manifests目录的yaml文件中nodeSelector改为kubernetes.io/os=linux
例如:vim setup/prometheus-operator-deployment.yaml,

      nodeSelector:        kubernetes.io/os: linux

其他的自行修改,可以如下命令过滤并查看是否需要修改

[root@k8s-master001 manifests]# grep -A1 nodeSelector  prometheus/*prometheus/prometheus-prometheus.yaml:  nodeSelector:prometheus/prometheus-prometheus.yaml:  nodeSelector:prometheus/prometheus-prometheus.yaml-    kubernetes.io/os: linux

部署kube-prometheus

1、安装operator

[root@k8s-master001 manifests]# kubectl  apply -f setup/namespace/monitoring createdcustomresourcedefinition.apiextensions.k8s.io/alertmanagers.monitoring.coreos.com createdcustomresourcedefinition.apiextensions.k8s.io/podmonitors.monitoring.coreos.com createdcustomresourcedefinition.apiextensions.k8s.io/probes.monitoring.coreos.com createdcustomresourcedefinition.apiextensions.k8s.io/prometheuses.monitoring.coreos.com createdcustomresourcedefinition.apiextensions.k8s.io/prometheusrules.monitoring.coreos.com createdcustomresourcedefinition.apiextensions.k8s.io/servicemonitors.monitoring.coreos.com createdcustomresourcedefinition.apiextensions.k8s.io/thanosrulers.monitoring.coreos.com createdclusterrole.rbac.authorization.k8s.io/prometheus-operator createdclusterrolebinding.rbac.authorization.k8s.io/prometheus-operator createddeployment.apps/prometheus-operator createdservice/prometheus-operator createdserviceaccount/prometheus-operator created[root@k8s-master001 manifests]# kubectl  get po -n monitoringNAME                                   READY   STATUS    RESTARTS   AGEprometheus-operator-74d54b5cfc-xgqg7   2/2     Running   0          2m40s

2、安装adapter

[root@k8s-master001 manifests]# kubectl  apply -f adapter/apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io createdclusterrole.rbac.authorization.k8s.io/prometheus-adapter createdclusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader createdclusterrolebinding.rbac.authorization.k8s.io/prometheus-adapter createdclusterrolebinding.rbac.authorization.k8s.io/resource-metrics:system:auth-delegator createdclusterrole.rbac.authorization.k8s.io/resource-metrics-server-resources createdconfigmap/adapter-config createddeployment.apps/prometheus-adapter createdrolebinding.rbac.authorization.k8s.io/resource-metrics-auth-reader createdservice/prometheus-adapter createdserviceaccount/prometheus-adapter createdservicemonitor.monitoring.coreos.com/prometheus-adapter created[root@k8s-master001 manifests]# kubectl  get po -n monitoringNAME                                   READY   STATUS    RESTARTS   AGEprometheus-adapter-557648f58c-9x446    1/1     Running   0          41sprometheus-operator-74d54b5cfc-xgqg7   2/2     Running   0          4m33s

3、安装alertmanager

[root@k8s-master001 manifests]# kubectl  apply -f alertmanager/alertmanager.monitoring.coreos.com/main createdsecret/alertmanager-main createdservice/alertmanager-main createdserviceaccount/alertmanager-main createdservicemonitor.monitoring.coreos.com/alertmanager created[root@k8s-master001 ~]# kubectl  get po -n monitoringNAME                                   READY   STATUS    RESTARTS   AGEalertmanager-main-0                    2/2     Running   0          53malertmanager-main-1                    2/2     Running   0          3m3salertmanager-main-2                    2/2     Running   0          53m

4、安装exporter

[root@k8s-master001 manifests]# kubectl  apply -f exporter/clusterrole.rbac.authorization.k8s.io/node-exporter createdclusterrolebinding.rbac.authorization.k8s.io/node-exporter createddaemonset.apps/node-exporter createdservice/node-exporter createdserviceaccount/node-exporter createdservicemonitor.monitoring.coreos.com/node-exporter created[root@k8s-master001 manifests]# kubectl  get po -n monitoring NAME                                   READY   STATUS    RESTARTS   AGEnode-exporter-2rvtt                    2/2     Running   0          108snode-exporter-9kwb6                    2/2     Running   0          108snode-exporter-9zlbb                    2/2     Running   0          108s

5、安装metrics

[root@k8s-master001 manifests]# kubectl  apply -f metricsclusterrole.rbac.authorization.k8s.io/kube-state-metrics createdclusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics createddeployment.apps/kube-state-metrics createdservice/kube-state-metrics createdserviceaccount/kube-state-metrics createdservicemonitor.monitoring.coreos.com/kube-state-metrics created[root@k8s-master001 manifests]# kubectl  get po -n monitoringNAME                                   READY   STATUS    RESTARTS   AGEkube-state-metrics-85cb9cfd7c-v9c4f    3/3     Running   0          2m8s

6、安装prometheus

[root@k8s-master001 manifests]# kubectl  apply -f prometheus/clusterrole.rbac.authorization.k8s.io/prometheus-k8s createdclusterrolebinding.rbac.authorization.k8s.io/prometheus-k8s createdservicemonitor.monitoring.coreos.com/prometheus-operator createdprometheus.monitoring.coreos.com/k8s createdrolebinding.rbac.authorization.k8s.io/prometheus-k8s-config createdrolebinding.rbac.authorization.k8s.io/prometheus-k8s createdrolebinding.rbac.authorization.k8s.io/prometheus-k8s createdrolebinding.rbac.authorization.k8s.io/prometheus-k8s createdrole.rbac.authorization.k8s.io/prometheus-k8s-config createdrole.rbac.authorization.k8s.io/prometheus-k8s createdrole.rbac.authorization.k8s.io/prometheus-k8s createdrole.rbac.authorization.k8s.io/prometheus-k8s createdprometheusrule.monitoring.coreos.com/prometheus-k8s-rules createdservice/prometheus-k8s createdserviceaccount/prometheus-k8s created[root@k8s-master001 manifests]# kubectl  get po -n monitoringNAME                                   READY   STATUS    RESTARTS   AGEprometheus-k8s-0                       3/3     Running   1          94sprometheus-k8s-1                       3/3     Running   1          94s

7、安装grafana

root@k8s-master001 manifests]# kubectl  apply -f grafana/secret/grafana-datasources createdconfigmap/grafana-dashboard-apiserver createdconfigmap/grafana-dashboard-cluster-total createdconfigmap/grafana-dashboard-controller-manager createdconfigmap/grafana-dashboard-k8s-resources-cluster createdconfigmap/grafana-dashboard-k8s-resources-namespace createdconfigmap/grafana-dashboard-k8s-resources-node createdconfigmap/grafana-dashboard-k8s-resources-pod createdconfigmap/grafana-dashboard-k8s-resources-workload createdconfigmap/grafana-dashboard-k8s-resources-workloads-namespace createdconfigmap/grafana-dashboard-kubelet createdconfigmap/grafana-dashboard-namespace-by-pod createdconfigmap/grafana-dashboard-namespace-by-workload createdconfigmap/grafana-dashboard-node-cluster-rsrc-use createdconfigmap/grafana-dashboard-node-rsrc-use createdconfigmap/grafana-dashboard-nodes createdconfigmap/grafana-dashboard-persistentvolumesusage createdconfigmap/grafana-dashboard-pod-total createdconfigmap/grafana-dashboard-prometheus-remote-write createdconfigmap/grafana-dashboard-prometheus createdconfigmap/grafana-dashboard-proxy createdconfigmap/grafana-dashboard-scheduler createdconfigmap/grafana-dashboard-statefulset createdconfigmap/grafana-dashboard-workload-total createdconfigmap/grafana-dashboards createddeployment.apps/grafana createdservice/grafana createdserviceaccount/grafana createdservicemonitor.monitoring.coreos.com/grafana created[root@k8s-master001 manifests]# kubectl  get po -n monitoringNAME                                   READY   STATUS    RESTARTS   AGEgrafana-b558fb99f-87spq                1/1     Running   0          3m14s

8、安装serviceMonitor

[root@k8s-master001 manifests]# kubectl  apply -f serviceMonitor/servicemonitor.monitoring.coreos.com/prometheus createdservicemonitor.monitoring.coreos.com/kube-apiserver createdservicemonitor.monitoring.coreos.com/coredns createdservicemonitor.monitoring.coreos.com/kube-controller-manager createdservicemonitor.monitoring.coreos.com/kube-scheduler createdservicemonitor.monitoring.coreos.com/kubelet created

9、查看全部运行的服务

[root@k8s-master001 manifests]# kubectl  get po -n monitoring NAME                                   READY   STATUS    RESTARTS   AGEalertmanager-main-0                    2/2     Running   0          90malertmanager-main-1                    2/2     Running   0          40malertmanager-main-2                    2/2     Running   0          90mgrafana-b558fb99f-87spq                1/1     Running   0          4m56skube-state-metrics-85cb9cfd7c-v9c4f    3/3     Running   0          10mnode-exporter-2rvtt                    2/2     Running   0          35mnode-exporter-9kwb6                    2/2     Running   0          35mnode-exporter-9zlbb                    2/2     Running   0          35mprometheus-adapter-557648f58c-9x446    1/1     Running   0          91mprometheus-k8s-0                       3/3     Running   1          7m49sprometheus-k8s-1                       3/3     Running   1          7m49sprometheus-operator-74d54b5cfc-xgqg7   2/2     Running   0          95mNAME                            TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGEservice/alertmanager-main       ClusterIP   10.98.96.94             9093/TCP                     91mservice/alertmanager-operated   ClusterIP   None                    9093/TCP,9094/TCP,9094/UDP   91mservice/grafana                 ClusterIP   10.108.204.33           3000/TCP                     6m30sservice/kube-state-metrics      ClusterIP   None                    8443/TCP,9443/TCP            12mservice/node-exporter           ClusterIP   None                    9100/TCP                     36mservice/prometheus-adapter      ClusterIP   10.98.16.117            443/TCP                      93mservice/prometheus-k8s          ClusterIP   10.109.119.37           9090/TCP                     9m22sservice/prometheus-operated     ClusterIP   None                    9090/TCP                     9m24sservice/prometheus-operator     ClusterIP   None                    8443/TCP                     97m

10、使用nodeport暴露grafana和prometheus服务,访问UI界面

---apiVersion: v1kind: Servicemetadata:  name: grafana-svc  namespace: monitoringspec:  type: NodePort  ports:  - port: 3000    targetPort: 3000  selector:    app: grafana---apiVersion: v1kind: Servicemetadata:  name: prometheus-svc  namespace: monitoringspec:  type: NodePort  ports:  - port: 9090    targetPort: 9090  selector:    prometheus: k8s

查看结果

[root@k8s-master001 manifests]# kubectl  get svc -n monitoringNAME                    TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)                      AGEgrafana-svc             NodePort    10.99.31.100            3000:30438/TCP               9sprometheus-svc          NodePort    10.102.245.8            9090:32227/TCP               3s

现在可以使用浏览器访问URL NodeIP:30438 NodeIP:32227 : NodeIP为k8s节点IP,当然也可以使用前文介绍的ingress暴露服务
例如:
prometheus: http://10.26.25.20:32227

2fd4b65e9908934aca14cbf8fa3e1418.png


grafana: http://10.26.25.20:30438 默认密码admin/admin,登录后需要修改admin密码

148a6c380e0b8348d7918f61f40349a7.png


以上,kube-prometheus已经部署完毕,可以用过prometheus查看到监控信息了。

几个小坑

坑位一

1、从prometheus target可以看到,kube-controller-manager和kube-scheduler都没有被监控

f9b9b903571347a6a753335692400c00.png


解决
这是因为serviceMonitor是根据label去选取svc的,我们可以看到对应的serviceMonitor是选取的namespace范围是kube-system

[root@k8s-master001 manifests]# grep -A2 -B2  selector serviceMonitor/prometheus-serviceMonitorKube*serviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml-    matchNames:serviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml-    - kube-systemserviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml:  selector:serviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml-    matchLabels:serviceMonitor/prometheus-serviceMonitorKubeControllerManager.yaml-      k8s-app: kube-controller-manager--serviceMonitor/prometheus-serviceMonitorKubelet.yaml-    matchNames:serviceMonitor/prometheus-serviceMonitorKubelet.yaml-    - kube-systemserviceMonitor/prometheus-serviceMonitorKubelet.yaml:  selector:serviceMonitor/prometheus-serviceMonitorKubelet.yaml-    matchLabels:serviceMonitor/prometheus-serviceMonitorKubelet.yaml-      k8s-app: kubelet--serviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml-    matchNames:serviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml-    - kube-systemserviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml:  selector:serviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml-    matchLabels:serviceMonitor/prometheus-serviceMonitorKubeScheduler.yaml-      k8s-app: kube-scheduler

2、创建kube-controller-manager和kube-scheduler service
k8s v1.19默认使用https,kube-controller-manager端口10257 kube-scheduler端口10259
kube-controller-manager-scheduler.yml

apiVersion: v1kind: Servicemetadata:  namespace: kube-system  name: kube-controller-manager  labels:    k8s-app: kube-controller-managerspec:  selector:    component: kube-controller-manager  type: ClusterIP  clusterIP: None  ports:  - name: https-metrics    port: 10257    targetPort: 10257    protocol: TCP---apiVersion: v1kind: Servicemetadata:  namespace: kube-system  name: kube-scheduler  labels:    k8s-app: kube-schedulerspec:  selector:    component: kube-scheduler  type: ClusterIP  clusterIP: None  ports:  - name: https-metrics    port: 10259    targetPort: 10259    protocol: TCP

执行命令

[root@k8s-master001 manifests]# kubectl apply -f  kube-controller-manager-scheduler.yml[root@k8s-master001 manifests]# kubectl  get svc -n kube-system NAME                      TYPE        CLUSTER-IP   EXTERNAL-IP   PORT(S)                        AGEkube-controller-manager   ClusterIP   None                 10257/TCP                      37mkube-scheduler            ClusterIP   None                 10259/TCP                      37m

3、创建kube-controller-manager和kube-scheduler endpoint
注意:addresses改成集群实际的IP
kube-ep.yml

apiVersion: v1kind: Endpointsmetadata:  labels:    k8s-app: kube-controller-manager  name: kube-controller-manager  namespace: kube-systemsubsets:- addresses:  - ip: 10.26.25.20  - ip: 10.26.25.21  - ip: 10.26.25.22  ports:  - name: https-metrics    port: 10257    protocol: TCP---apiVersion: v1kind: Endpointsmetadata:  labels:    k8s-app: kube-scheduler  name: kube-scheduler  namespace: kube-systemsubsets:- addresses:  - ip: 10.26.25.20  - ip: 10.26.25.21  - ip: 10.26.25.22  ports:  - name: https-metrics    port: 10259    protocol: TCP
[root@k8s-master001 manifests]# kubectl  apply -f kube-ep.ymlendpoints/kube-controller-manager createdendpoints/kube-scheduler created[root@k8s-master001 manifests]# kubectl  get ep -n kube-systemNAME                      ENDPOINTS                                                        AGEkube-controller-manager   10.26.25.20:10257,10.26.25.21:10257,10.26.25.22:10257            16mkube-scheduler            10.26.25.20:10259,10.26.25.21:10259,10.26.25.22:10259            16m

现在看下页面上prometheus target,已经能看到kube-controller-manager和kube-scheduler被监控了

a7a5eedb6e24eb1e9e0392a85a9c1a47.png

坑位二

1、默认清理下,kube-controller-manager和kube-scheduler绑定IP为127.0.0.1,如果需要监控这两个服务,需要修改kube-controller-manager和kube-scheduler配置,让其绑定到0.0.0.0
2、配置文件所在目录/etc/kubernetes/manifests
修改kube-controller-manager.yaml中--bind-address=0.0.0.0
修改kube-scheduler.yaml中--bind-address=0.0.0.0
3、重启kubelet:systemctl restart kubelet
4、查看是否生效,返回200即为成功

[root@k8s-master002 manifests]# curl -I -k https://10.26.25.20:10257/healthzHTTP/1.1 200 OKCache-Control: no-cache, privateContent-Type: text/plain; charset=utf-8X-Content-Type-Options: nosniffDate: Tue, 15 Sep 2020 06:19:32 GMTContent-Length: 2[root@k8s-master002 manifests]# curl -I -k https://10.26.25.20:10259/healthzHTTP/1.1 200 OKCache-Control: no-cache, privateContent-Type: text/plain; charset=utf-8X-Content-Type-Options: nosniffDate: Tue, 15 Sep 2020 06:19:36 GMTContent-Length: 2

最后

kube-prometheus配置很多,这里只是做了最基础的设置。更多需求请自行查看官方文档

218d84e94e6ba0e61569df86547e07cd.png

注:文中图片来源于网络,如有侵权,请联系我及时删除。

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐