一、概述

kube-prometheus 是一整套监控解决方案,它使用 Prometheus 采集集群指标,Grafana 做展示,包含如下组件:

  • The Prometheus Operator
  • Highly available Prometheus
  • Highly available Alertmanager
  • Prometheus node-exporter
  • Prometheus Adapter for Kubernetes Metrics APIs (k8s-prometheus-adapter)
  • kube-state-metrics
  • Grafana

二、部署 Kube-Prometheus

1、下载 Kube-Prometheus 代码

  • 方法一
git clone https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus
git branch -r	# 查看当前分支有哪些
git checkout release-0.9	# 切换到自己 Kubernetes 兼容的版本
  • 方法二
git clone -b release-0.9 https://github.com/prometheus-operator/kube-prometheus.git

注:在release-0.11版本之后新增了NetworkPolicy
默认是允许自己访问,如果了解NetworkPolicy可以修改一下默认的规则,可以用查看 ls *networkPolicy*,如果不修改的话则会影响到修改NodePort类型也无法访问
如果不会Networkpolicy可以直接删除就行

2、修改 Kube-Prometheus 镜像源

国外镜像源某些镜像无法拉取,我们这里修改prometheus-operator,prometheus,alertmanager,kube-state-metrics,node-exporter,prometheus-adapter的镜像源为国内镜像源。我这里使用的是中科大的镜像源。

cd ./kube-prometheus/manifests/
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' setup/prometheus-operator-deployment.yaml
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' prometheus-prometheus.yaml 
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' alertmanager-alertmanager.yaml
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' kube-state-metrics-deployment.yaml
sed -i 's/k8s.gcr.io/lank8s.cn/g' kube-state-metrics-deployment.yaml
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' node-exporter-daemonset.yaml
sed -i 's/quay.io/quay.mirrors.ustc.edu.cn/g' prometheus-adapter-deployment.yaml
sed -i 's/k8s.gcr.io/lank8s.cn/g' prometheus-adapter-deployment.yaml
grep "image: " * -r		# 确认一下是否还有国外镜像

3、修改类型为 NodePort

为了可以从外部访问prometheus,alertmanager,grafana,我们这里修改promethes,alertmanager,grafana的service类型为NodePort类型。

  • 修改prometheus的service
cat prometheus-service.yaml 
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.29.1
    prometheus: k8s
  name: prometheus-k8s
  namespace: monitoring
spec:
  type: NodePort # 新增
  ports:
  - name: web
    port: 9090
    nodePort: 30090 # 新增
    targetPort: web
  selector:
    app: prometheus
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    prometheus: k8s
  sessionAffinity: ClientIP
  • 修改alertmanager的service
apiVersion: v1
kind: Service
metadata:
  labels:
    alertmanager: main
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 0.22.2
  name: alertmanager-main
  namespace: monitoring
spec:
  type: NodePort # 新增
  ports:
  - name: web
    port: 9093
    nodePort: 30093  # 新增
    targetPort: web
  selector:
    alertmanager: main
    app: alertmanager
    app.kubernetes.io/component: alert-router
    app.kubernetes.io/name: alertmanager
    app.kubernetes.io/part-of: kube-prometheus
  sessionAffinity: ClientIP
  • 修改grafana的service
cat grafana-service.yaml
apiVersion: v1
kind: Service
metadata:
  labels:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 8.1.1
  name: grafana
  namespace: monitoring
spec:
  type: NodePort # 新增
  ports:
  - name: http
    port: 3000
    nodePort: 32000 # 新增
    targetPort: http
  selector:
    app.kubernetes.io/component: grafana
    app.kubernetes.io/name: grafana
    app.kubernetes.io/part-of: kube-prometheus

4、修改Prometheus保留时间

vim prometheus-prometheus.yaml
新增: spec.retention: 15d

5、安装 kube-Prometheus 并确认状态

kubectl apply -f setup/
# 下载prometheus-operator镜像需要花费几分钟,这里等待几分钟,直到prometheus-operator变成running状态
kubectl apply -f .
watch kubectl get po -n monitoring 	# 等待所有镜像变成Running状态

6、通过Ingres访问Prometheus

cat > prometheus-ingress.yaml << 'EOF'
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  namespace: monitoring
  name: prometheus-ingress
spec:
  ingressClassName: nginx
  rules:
  - host: grafana.xxx.net  # 访问 Grafana 域名
    http:
      paths:
      - path: /
        backend:
          serviceName: grafana
          servicePort: 3000
  - host: prometheus.xxx.net  # 访问 Prometheus 域名
    http:
      paths:
      - path: /
        backend:
          serviceName: prometheus-k8s 
          servicePort: 9090
  - host: alertmanager.xxx.net  # 访问 alertmanager 域名
    http:
      paths:
      - path: /
        backend:
          serviceName: alertmanager-main
          servicePort: 9093
EOF

三、访问

访问 Prometheus:

浏览器打开: http://prometheus.xxx.net

在这里插入图片描述

确保state的状态都是UP

访问 Alertmanager:

浏览器打开:alertmanager.xxx.net

在这里插入图片描述

访问 Grafana:

浏览器打开:grafana.xxx.net

首次登录,用户名和密码,都是admin , 登录之后,会提示修改密码,可以选择跳过skip

在这里插入图片描述

报警详细配置可参考这里

四、Alertmanager 邮件报警

cat > alertmanager-secret.yaml <<'EOF'
apiVersion: v1
data: {}
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    global:
      resolve_timeout: 5m
      smtp_smarthost: 'smtp.qq.com:465'
      smtp_from: 'xxx@qq.com'     # 发件人邮箱
      smtp_auth_username: 'xxx@qq.com'    # 发件人邮箱
      smtp_auth_password: 'lkjtrbnmqfesbdhj'  # QQ授权码
      smtp_hello: 'qq.com'
      smtp_require_tls: false
    route:
      group_by: ['alertname']
      group_interval: 5m
      group_wait: 30s
      receiver: default-receiver
      repeat_interval: 12h
      routes:
      - receiver: 'lms-saas'
        match_re:
          namespace: ^(lms-saas|lms-standard).*$    # 根据namespace进行区分报警
    receivers:
    - name: 'default-receiver'
      email_configs:
      - to: 'xxx@chinaedu.net'    # 收件人邮箱,设置多个收件人
        send_resolved: true
      - to: 'xxx@chinaedu.net'
        send_resolved: true
    - name: 'lms-saas'    # 跟上面定义的名字必须一样
      email_configs:
      - to: 'xxx@chinaedu.net'    # 收件人邮箱,也可以设置多个
        send_resolved: true
      - to: 'xxx@chinaedu.net'
        send_resolved: true
type: Opaque
EOF
kubectl delete -f alertmanager-secret.yaml 
kubectl apply -f alertmanager-secret.yaml

验证一下邮件报警是否生效

在这里插入图片描述

验证一下是否可以收到报警邮件

在这里插入图片描述

五、Alertmanager 钉钉报警

1、注册钉钉账号->新建报警群->机器人管理

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

2、自定义(通过webhook接入自定义服务)

在这里插入图片描述

  • 复制生成的Webhook

在这里插入图片描述

3、编写钉钉yaml

cat > dingtalk-webhook.yaml <<'EOF'
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    run: dingtalk
  name: webhook-dingtalk
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      run: dingtalk
  template:
    metadata:
      labels:
        run: dingtalk
    spec:
      containers:
      - name: dingtalk
        image: timonwong/prometheus-webhook-dingtalk:v1.4.0
        imagePullPolicy: IfNotPresent
        # 设置钉钉群聊自定义机器人后,使用实际 access_token 替换下面 xxxxxx部分
        args:
          - --ding.profile=webhook1=https://oapi.dingtalk.com/robot/send?access_token=你的token
        ports:
        - containerPort: 8060
          protocol: TCP 
---
apiVersion: v1
kind: Service
metadata:
  labels:
    run: dingtalk
  name: webhook-dingtalk
  namespace: monitoring
spec:
  ports:
  - port: 8060
    protocol: TCP
    targetPort: 8060
  selector:
    run: dingtalk
  sessionAffinity: None
EOF

kubectl apply -f dingtalk-webhook.yaml	# 应用配置后,对应的pod和service就起来了,我们可以看到侦听的端口为8060.

4、alertmanager配置告警通知

cat > alertmanager-secret.yaml <<'EOF'
apiVersion: v1
data: {}
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
stringData:
  alertmanager.yaml: |-
    global:
      resolve_timeout: 5m
    route:
      group_by: ['alertname']
      group_interval: 5m
      group_wait: 30s
      receiver: "webhook"
      repeat_interval: 12h
    receivers:
    #配置钉钉告警的webhook
    - name: 'webhook'
      webhook_configs:
      - url: 'http://webhook-dingtalk.monitoring.svc.cluster.local:8060/dingtalk/webhook1/send'
        send_resolved: true
type: Opaque
EOF

5、验证一下钉钉报警是否生效

在这里插入图片描述

6、验证一下钉钉可以收到报警

kubectl run busybox --image=busybox # 启动一个错误的容器

在这里插入图片描述

六、扩展知识

修正kube-prometheus中grafana组件自带dashboard的默认时区

kube-prometheus监控 controller-manager && scheduler 组件

Logo

为开发者提供学习成长、分享交流、生态实践、资源工具等服务,帮助开发者快速成长。

更多推荐