一、简介

监控对于业务系统的正常运行有着极其重要的作用,本文将介绍prometheus+grafana监控系统的原理架构、功能及部署流程。promethues是一套开源的监控告警系统,被越来越多的公司所接受,并于2016年加入CNCF。其官方系统架构如下:在这里插入图片描述

promethues生态中有丰富的采集插件,通常称为exporter,prometheus通过主动拉取(pull)的方式将exporter暴露的metrics信息存储到自带的TSDB,可视化插件grafana通过promQL查询存储在prometheus的指标数据并展示出来。同时prometheus根据配置好的告警规则,与采集到的指标信息对比,若满足告警信息,则将告警推送到告警组件alertmanager,进而将告警信息通过邮件、钉钉、短信等方式告知相关负责人。

二、当前场景监控架构

一个完整的监控告警系统通常包含指标采集、数据存储、可视化展示、告警通知几个方面。github上的kube-prometheus项目(https://github.com/coreos/kube-prometheus)整合了上述几个方面的内容,以下为结合当前应用场景对kube-prometheus项目的实际应用,如下图为当前应用场景的监控架构图:
![image.png](https://img-blog.csdnimg.cn/ffa748bc151b4e7bb319612066886df9.png

相关模块作用说明:

  • prometheus
  • grafana
  • prometheus-adapter
  • node-exporter
  • mysqld-exporter
  • kube-state-metrics
  • blackbox
  • kafka-exporter
  • redis-exporter
  • php-fpm-exporter
  • prometheus-operator
    其中,针对tidb的监控,tidb官方提供监控、tidb应用为一体的部署方式,监控内容包括数据库性能、binlog、服务器性能、tidb服务存活状态等,因其采用二进制部署的方式,所以tidb的监控没有其对应的ServiceMonitor。
    相反,在上图中,有ServiceMonitor对应的采集组件均为k8s方式编排部署。

1、采集

ServiceMonitor服务发现
如prometheus要刮到不同的命名空间下的exporter采集的metrics信息,则需要将metrics信息暴露在集群外,一方面增加了配置的复杂度,也增加了安全风险。这时候,ServiceMonitor应运而生。ServiceMonitor是Kubernetes自定义资源,该资源描述了Prometheus Server的Target列表,能通过Selector来依据 Labels 选取对应的Service的endpoints,并让 Prometheus Server 通过 Service 进行拉取,从而实现跨命名空间的动态服务发现。一个ServiceMonitor可对应一类service,只需要将exporter用k8s的方式部署,每个exporter对应一个service,ServiceMonitor根据labelSelector发现相关联的service,进而prometheus可以从service中拉取Metrics信息。如下为创建中间件的serviceMonitor资源生命文件:

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: middlewares-exporter
  namespace: monitoring
spec:
  endpoints:
  - interval: 15s
    port: http
  selector:
    matchLabels:
      app: middlewares-exporter

label定义为app: middlewares-exporter的service都能被该serviceMonitor发现。
基于文件的文件的服务发现
对于不采用serviceMonitor做服务发现的exporter,需要在promethues.yaml文件配置服务发现,配置步骤请参考第三章第3小节。

2、展示

本系统采用grafana为监控系统的展示模块,在官方提供了丰富的展示模板,在grafana中配置数据源(此处为prometheus数据库),并导入官方提供的模板文件(可参考:https://grafana.com/grafana/dashboards),就能展示对应的数据指标。其模板文件是一个json文件,文件中定义了面板的样式、查询指标的SQL,以及当前面板的告警规则。

3、告警

方式一:alertmanager是独立的告警模块,prometheus根据告警规则将告警信息发送至alertmanager,经过alertmanager处理之后,通过邮件、钉钉等方式将告警信息发送给接收人。此方式可自定义告警模板,支持更多的定制化场景,可对告警静默、抑制,同时也增大了告警规则配置的复杂度,不易维护。
方式二:监控可视化模块grafana自带告警功能,可在展示面板中配置告警规则,告警信息可通过邮件、钉钉等方式通知对应的负责人,grafana的告警功能仅支持graph面板。
grafana自带的告警模块已能满足当前的应用场景,而且方便配置管理。因此本系统使用grafana自带的告警功能
k8s集群通过kube-prometheus部署监控系统

三、prometheus-operator及组件介绍

Prometheus 作为 Kubernetes 监控的事实标准,有着强大的功能和良好的生态。但是它不支持分布式,不支持数据导入、导出,不支持通过 API 修改监控目标和报警规则,所以在使用它时,通常需要写脚本和代码来简化操作。Prometheus Operator 为监控 Kubernetes service、deployment 和 Prometheus 实例的管理提供了简单的定义,简化在 Kubernetes 上部署、管理和运行 Prometheus 和 Alertmanager 集群。

MetricServer:是kubernetes集群资源使用情况的聚合器,收集数据给kubernetes集群内使用,如 kubectl,hpa,scheduler等。

PrometheusOperator:是一个系统监测和警报工具箱,用来存储监控数据。

NodeExporter:用于各node的关键度量指标状态数据。

KubeStateMetrics:收集kubernetes集群内资源对象数据,制定告警规则。

Prometheus:采用pull方式收集apiserver,scheduler,controller-manager,kubelet组件数据,通过http协议传输。

四、部署

1、准备安装所需文件

下载资源文件,并整理分类文件
将官方提供的资源生命文件根据需要进行分类,本文中已适当裁剪不需要的组件,如:alertmanager。

git clone https://github.com/coreos/kube-prometheus.git
cd kube-prometheus/
git branch -r
git checkout origin/release-0.12
cd manifests/
mkdir adapter grafana blackbox alertmanager kube-state-metrics node-exporter serviceMonitor operator prometheus 
mv grafana-* grafana
mv blackbox* blackbox/
mv alertmanager-* alertmanager/
mv kubeStateMetrics-* kube-state-metrics/
mv nodeExporter-* node-exporter/
mv *Adapter* adapter/
mv *serviceM* serviceMonitor/
mv *Operator* operator/
mv prometheus-* prometheus/ 

2、prometheus持久化

在prometheus/prometheus-prometheus.yaml文件的末尾添加一下内容
  version: 2.41.0
  storage:
    volumeClaimTemplate:
      spec:
        storageClassName: nfs-storage
        resources:
          requests:
            storage: 50Gi

3、grafana持久化

3.1 创建pvc
[root@master1 prometheus]# kubectl apply -f  grafana/grafana-pvc.yaml 
persistentvolumeclaim/grafana-pvc created
[root@master1 prometheus]# cat grafana/grafana-pvc.yaml 
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: grafana-pvc
  namespace: monitoring
  labels:
    app: grafana-pvc
spec:
  accessModes: #指定访问类型
  - ReadWriteOnce
  volumeMode: Filesystem #指定卷类型
  resources:
    requests:
      storage: 10Gi
  storageClassName: nfs-storage #指定创建的存储类的名字
3.2 修改文件grafana/grafana-deployment.yaml文件 挂载pvc
      volumes:
      #找到这两行并注释注释
      #- emptyDir: {}
      #  name: grafana-storage
      #新增以下三行
      - name: grafana-storage
        persistentVolumeClaim:
          claimName: grafana-pvc

4、准备部分需要外网的镜像
4.1 下载镜像、打tag,并导出镜像
docker pull registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0
docker pull bitnami/kube-state-metrics:v2.7.0
docker pull bitnami/kube-state-metrics:2.7.0
docker tag bitnami/kube-state-metrics:2.7.0 registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0
docker save -o kube-state-metrics-2.7.0.tar registry.k8s.io/kube-state-metrics/kube-state-metrics:v2.7.0
docker pull v5cn/prometheus-adapter:v0.10.0
docker tag v5cn/prometheus-adapter:v0.10.0 registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.10.0
docker save -o adapter-0.10.0.tar registry.k8s.io/prometheus-adapter/prometheus-adapter:v0.10.0
4.2 将镜像传到其他节点上,并导入镜像
[root@master1 prometheus]# scp adapter-0.10.0.tar master2.k8s.test:/tmp
[root@master1 prometheus]# scp adapter-0.10.0.tar master3.k8s.test:/tmp
[root@master1 prometheus]# scp adapter-0.10.0.tar node1.k8s.test:/tmp
[root@master1 prometheus]# scp kube-state-metrics-2.7.0.tar master2.k8s.test:/tmp
[root@master1 prometheus]# scp kube-state-metrics-2.7.0.tar master3.k8s.test:/tmp
[root@master1 prometheus]# scp kube-state-metrics-2.7.0.tar node1.k8s.test:/tmp
4.2 登录其他节点,导入镜像
[root@master2 ~]# docker load -i /tmp/kube-state-metrics-2.7.0.tar
[root@master2 ~]# docker load -i /tmp/adapter-0.10.0.tar

5、配置prometheus服务动态发现

对于部署好的prometheus,我们常需要对其添加配置。在此,prometheus使用配置文件(prometheus-additional.yaml)创建secret然后挂载到prometheus的pod内的方式实现配置的改动。对于不采用serviceMonitor做服务发现的exporter,可以在prometheus-additional.yaml文件中添加发现的配置,然后重新生成secret。

5.1 prometheus-additional.yaml文件内容:
- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name
5.2 配置prometheus/prometheus.prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    ...
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  additionalScrapeConfigs:                    #配置服务发现功能
    name: additional-configs                  #secret 资源对象名称
    key: prometheus-additional.yaml           #secret 对象中的key

5.3 创建secret对象并验证:
  • 后续该文件如有变更,重新删除secret,并重新执行下面创建secret的命令即可,默认30s后生效
[root@master1 prometheus]# kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
secret/additional-configs created
[root@master1 prometheus]# kubectl get secret -n monitoring
NAME                             TYPE                                  DATA   AGE
additional-configs               Opaque                                1      42s
default-token-6nnl7              kubernetes.io/service-account-token   3      148m
grafana-config                   Opaque                                1      137m
grafana-datasources              Opaque                                1      137m
grafana-token-rhsf9              kubernetes.io/service-account-token   3      137m
kube-state-metrics-token-x7ns4   kubernetes.io/service-account-token   3      136m
node-exporter-token-6dhnm        kubernetes.io/service-account-token   3      137m
prometheus-adapter-token-k8zrn   kubernetes.io/service-account-token   3      135m
prometheus-k8s-token-nxmg2       kubernetes.io/service-account-token   3      136m
[root@master1 prometheus]# kubectl get secret -n monitoring additional-configs
NAME                 TYPE     DATA   AGE
additional-configs   Opaque   1      52s
[root@master1 prometheus]# kubectl get secret -n monitoring additional-configs -oyaml
apiVersion: v1
data:
  prometheus-additional.yaml: LSBqb2JfbmFtZTogImh0dHBfMnh4LWFwaSIKICBzY3JhcGVfaW50ZXJ2YWw6IDEwcwogIHNjcmFwZV90aW1lb3V0OiA1cwogIG1ldHJpY3NfcGF0aDogL3Byb2JlCiAgcGFyYW1zOgogICAgbW9kdWxlOiBbaHR0cF8yeHhdCiAgc3RhdGljX2NvbmZpZ3M6CiAgLSB0YXJnZXRzOgogICAgLSBodHRwczovL3d3dy5iYWlkdS5jb20KICByZWxhYmVsX2NvbmZpZ3M6CiAgLSBzb3VyY2VfbGFiZWxzOiBbX19hZGRyZXNzX19dCiAgICB0YXJnZXRfbGFiZWw6IF9fcGFyYW1fdGFyZ2V0CiAgLSBzb3VyY2VfbGFiZWxzOiBbX19wYXJhbV90YXJnZXRdCiAgICB0YXJnZXRfbGFiZWw6IGluc3RhbmNlCiAgLSB0YXJnZXRfbGFiZWw6IF9fYWRkcmVzc19fCiAgICByZXBsYWNlbWVudDogYmxhY2tib3gtZXhwb3J0ZXI6OTExNQo=
kind: Secret
metadata:
  creationTimestamp: "2023-05-25T05:06:43Z"
  name: additional-configs
  namespace: monitoring
  resourceVersion: "2953034"
  uid: 42593297-e962-4352-82f4-ed07388b190c
type: Opaque

[root@master1 prometheus]# echo "LSBqb2JfbmFtZTogImh0dHBfMnh4LWFwaSIKICBzY3JhcGVfaW50ZXJ2YWw6IDEwcwogIHNjcmFwZV90aW1lb3V0OiA1cwogIG1ldHJpY3NfcGF0aDogL3Byb2JlCiAgcGFyYW1zOgogICAgbW9kdWxlOiBbaHR0cF8yeHhdCiAgc3RhdGljX2NvbmZpZ3M6CiAgLSB0YXJnZXRzOgogICAgLSBodHRwczovL3d3dy5iYWlkdS5jb20KICByZWxhYmVsX2NvbmZpZ3M6CiAgLSBzb3VyY2VfbGFiZWxzOiBbX19hZGRyZXNzX19dCiAgICB0YXJnZXRfbGFiZWw6IF9fcGFyYW1fdGFyZ2V0CiAgLSBzb3VyY2VfbGFiZWxzOiBbX19wYXJhbV90YXJnZXRdCiAgICB0YXJnZXRfbGFiZWw6IGluc3RhbmNlCiAgLSB0YXJnZXRfbGFiZWw6IF9fYWRkcmVzc19fCiAgICByZXBsYWNlbWVudDogYmxhY2tib3gtZXhwb3J0ZXI6OTExNQo=" |base64 -d

- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name

#prometheus容器内验证
[root@master1 prometheus]# kubectl exec -n monitoring prometheus-k8s-0 -it sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/prometheus $  cat  /etc/prometheus/config_out/prometheus.env.yaml |grep kubernetes-service -A 3
- job_name: kubernetes-service-endpoints
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
/prometheus $

6、开始安装部署

[root@master1 prometheus]# kubectl create -f setup/ -f  operator/ -f adapter/ -f grafana/ -f serviceMonitor/ -f blackbox/ -f kube-state-metrics/ -f node-exporter/ -f prometheus/

7、验证

[root@master1 prometheus]# kubectl get pods -n monitoring
NAME                                   READY   STATUS    RESTARTS   AGE
blackbox-exporter-58d99cfb6d-bxvsb     3/3     Running   0          56m
grafana-69b474cbc8-8zqh9               1/1     Running   0          56m
kube-state-metrics-c9f8b947b-fgzqx     3/3     Running   0          56m
node-exporter-6sb9v                    2/2     Running   0          56m
node-exporter-9v7bq                    2/2     Running   0          56m
node-exporter-cv9xt                    2/2     Running   0          56m
node-exporter-nm86z                    2/2     Running   0          56m
prometheus-adapter-5bf8d6f7c6-nspkp    1/1     Running   0          52m
prometheus-adapter-5bf8d6f7c6-xn9fw    1/1     Running   0          52m
prometheus-k8s-0                       2/2     Running   0          65s
prometheus-k8s-1                       2/2     Running   0          65s
prometheus-operator-6958d799cd-nqscz   2/2     Running   0          56m

#验证持久化pvc
[root@master1 prometheus]# ls /data/nfs_provisioner/monitoring-prometheus-k8s-db-prometheus-k8s-0-pvc-4464e6f6-8966-4b25-994d-8cd498961404/prometheus-db/
chunks_head  lock  queries.active  wal
[root@master1 prometheus]# ls /data/nfs_provisioner/monitoring-prometheus-k8s-db-prometheus-k8s-1-pvc-717f082f-4259-4718-81db-89d11f9c2bd9/prometheus-db/
chunks_head  lock  queries.active  wal
[root@master1 prometheus]# ls /data/nfs_provisioner/monitoring-grafana-pvc-pvc-388b94be-22fe-4139-b582-8f238860cd3f/
alerting  csv  file-collections  grafana.db  plugins  png

8、配置集群外部访问

8.1 删除网络策略
[root@master1 prometheus]# kubectl -n monitoring delete networkpolicies.networking.k8s.io --all
8.2 通过ingress实现外部访问
8.2.1 配置grafana-ing
#查看service名称:grafana 
[root@master1 prometheus]# kubectl get svc -n monitoring|grep grafana
grafana               NodePort    10.96.131.142   <none>        3000:30030/TCP                  130m
#编写ing配置文件
[root@master1 prometheus]# cat ../ingress/grafana-ing.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana-ing
  namespace: monitoring
spec:
  rules:
  - host: grafana.example.com
    http:
      paths:
      - backend:
          service:
            name: grafana
            port:
              number: 3000
        path: /
        pathType: Prefix
#创建ingress
[root@master1 prometheus]# kubectl create -f  grafana/grafana-ing.yaml
ingress.networking.k8s.io/grafana-ing created

8.2.2 配置prometheus-ing
#查看service名称:prometheus-k8s
[root@master1 prometheus]# kubectl get svc -n monitoring|grep prometheus-k8s
prometheus-k8s        NodePort    10.96.192.126   <none>        9090:30090/TCP,8080:32245/TCP   74m
#编写ing配置文件
[root@master1 prometheus]# cat  prometheus/prometheus-ing.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus-k8s-ing
  namespace: monitoring
spec:
  rules:
  - host: prometheus-k8s.example.com
    http:
      paths:
      - backend:
          service:
            name: prometheus-k8s
            port:
              number: 9090
        path: /
        pathType: Prefix
#创建ingress
[root@master1 prometheus]# kubectl create -f   prometheus/prometheus-ing.yaml
ingress.networking.k8s.io/prometheus-k8s-ing created

8.2.3 验证
[root@master1 prometheus]# kubectl get ing -n monitoring
NAME                 CLASS   HOSTS                        ADDRESS         PORTS   AGE
grafana-ing          nginx   grafana.example.com          10.96.117.202   80      2m9s
prometheus-k8s-ing   nginx   prometheus-k8s.example.com   10.96.117.202   80      2m17s

8.3 通过nodePort实现外部访问
8.3.1 配置grafana nodeport
#修改配置文件:添加 
	type: NodePort
	nodePort: 30030
[root@master1 prometheus]# cat    grafana/grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 9.3.2
  name: grafana
  namespace: monitoring
spec:
  ports:
  - name: http
    port: 3000
    nodePort: 30030  #新增nodePort端口
    targetPort: http
  type: NodePort  #新增svc类型未NodePort
  selector:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus

#应用生效
[root@master1 prometheus]# kubectl apply -f   grafana/grafana-service.yaml
service/grafana configured

#验证
[root@master1 prometheus]# kubectl get svc -n monitoring
NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)              AGE
grafana               NodePort    10.96.131.142   <none>        3000:30030/TCP       119m
          119m


8.3.2 配置prometheus nodeport
#修改配置文件:添加 
	type: NodePort
	nodePort: 30090
[root@master1 prometheus]# cat prometheus/prometheus-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.41.0
  name: prometheus-k8s
  namespace: monitoring
spec:
  ports:
  - name: web
    port: 9090
    nodePort: 30090 #新增nodePort端口
    targetPort: web
  - name: reloader-web
    port: 8080
    targetPort: reloader-web
  type: NodePort #新增svc类型未NodePort
  selector:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
  sessionAffinity: ClientIP

#应用生效
[root@master1 prometheus]# kubectl apply -f  prometheus/prometheus-service.yaml
Warning: resource services/prometheus-k8s is missing the kubectl.kubernetes.io/last-applied-configuration annotation which is required by kubectl apply. kubectl apply should only be used on resources created declaratively by either kubectl create --save-config or kubectl apply. The missing annotation will be patched automatically.
service/prometheus-k8s configured

#验证
[root@master1 prometheus]# kubectl get svc -n monitoring|grep prometheus
NAME                  TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)              AGE
prometheus-k8s        NodePort    10.96.192.126   <none>        9090:30090/TCP,8080:32245/TCP   69m

五、验证

1、本机添加hosts解析

修改C:\Windows\System32\drivers\etc\hosts文件

在这里插入图片描述

2、浏览器访问验证

2.1 验证grafana
  • grafana默认账号密码为admin\admin

在这里插入图片描述

2.2 验证prometheus

在这里插入图片描述
在这里插入图片描述

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐