prometheus监控K8S

本文采用helm安装Prometheus+Grafana配置alertmanager及告警规则实现邮件报警。# helm仓库 grafana: https://grafana.github.io/helm-chartsprometheus-community: https://prometheus-community.github.io/helm-charts # chart包 grafana/g

黑色鲸鱼

427人浏览 · 2022-09-20 21:22:29

黑色鲸鱼 · 2022-09-20 21:22:29 发布

一、概述

本文采用helm安装Prometheus+Grafana

配置alertmanager及告警规则实现邮件报警。

其中所采用的helm仓库及chart包如下所示：

# helm仓库
grafana: https://grafana.github.io/helm-charts
prometheus-community: https://prometheus-community.github.io/helm-charts

# chart包
grafana/grafana
prometheus-community/prometheus

二、准备工作

安装helm
项目地址：https://github.com/helm/helm

安装：

[root@master01]# wget https://get.helm.sh/helm-v3.8.1-linux-amd64.tar.gz   # 下载（自行选择版本）
[root@master01]# tar zxvf helm-v3.8.1-linux-amd64.tar.gz   # 解压
[root@master01]# mv linux-amd64/helm /usr/local/bin/   # 安装
[root@master01]# helm version  # 验证

删除Helm使用时关于kubernetes文件的警告

chmod g-rw ~/.kube/config
chmod o-r ~/.kube/config

chart包下载

# 添加grafana和prometheus-community仓库（无响应时多尝试几次）
[root@master01]# helm repo add grafana https://grafana.github.io/helm-charts
[root@master01]# helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
[root@master01]# helm repo update   # 更新仓库
[root@master01]# helm search repo grafana  # 查询chart
[root@master01]# mkdir -p ~/workspace/prometheus  # 创建工作目录
[root@master01]# cd ~/workspace/prometheus  # 拉取所有的chart包（请放到相应的目录中）
helm pull grafana/grafana
helm pull prometheus-community/prometheus
helm pull prometheus-community/prometheus-mysql-exporter
helm pull prometheus-community/prometheus-redis-exporter
helm pull prometheus-community/prometheus-kafka-exporter
helm pull prometheus-community/prometheus-rabbitmq-exporter
[root@node01 ~]# cd /root/workspace/prometheus/
tar zxvf [压缩包]            # 分别解压

镜像同步
prometheus内嵌kube-state-metrics安装包，其使用的是gcr镜像，也是所有chart包中唯一的gcr镜像，可能会导致镜像拉取失败，因此有必要提前同步该镜像
在这里插入图片描述

编辑配置文件
已同步到个人阿里云镜像仓库

[root@master01 prometheus]# cd prometheus
[root@master01 prometheus]# vim charts/kube-state-metrics/values.yaml
# Default values for kube-state-metrics.
prometheusScrape: true
image:
  repository: registry.cn-zhangjiakou.aliyuncs.com/gcr-sync/kube-state-metrics
  tag: v2.3.0
  pullPolicy: IfNotPresent

安装Prometheus
进入工作目录，按需修改镜像，持久化存储，副本数等配置；
建议首次部署时直接修改values中的配置，而不是用–set的方式，这样后期upgrade不必重复设置。

[root@master01 prometheus]# cd  ~/workspace/promethues/promethues
[root@master01 prometheus]# vim values.yaml
设置持久化存储
若无需持久化，将enabled设置为false
若使用文件存储，需将accessMode改为ReadWriteMany
storageClass的创建请参考之前的文章
/persistentVolume   # 搜索持久化设置，VIM界面按Esc后输入(再按n搜索下一个)：
  persistentVolume:
   ## If true, alertmanager will create/use a Persistent Volume Claim
    ## If false, use emptyDir
    enabled: false
#总共有四处，分别为alertmanager，Prometheus server，persistentVolume，pushgateway。
#参考官方文档建议配置，本文仅开启Prometheus server的持久化，其它的关闭
  alertmanager:
  ## If false, alertmanager will not be installed
    enabled: true
    
      service:
      ## If false, no Service will be created for the Prometheus server
    enabled: true

pushgateway:
  ## If false, pushgateway will not be installed
  enabled: true

  ## Use an alternate scheduler, e.g. "stork".
  ## ref: https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/
  ##
  # schedulerName:

  persistentVolume:
    ## If true, Prometheus server will create/use a Persistent Volume Claim
    ## If false, use emptyDir
    ##
    enabled: false

多副本
设置replicaCount为3，并开启statefulset
 ## Use a StatefulSet if replicaCount needs to be greater than 1 (see below)
  ##
  replicaCount: 3

  ## Annotations to be added to deployment
  ##
  deploymentAnnotations: {}

  statefulSet:
    ## If true, use a statefulset instead of a deployment for pod management.
    ## This allows to scale replicas to more than 1 pod
    ##
    enabled: true

开启NodePort
Alertmanager，更改ClusterIP为NodePort，并设置nodeport端口号。在370行左右
  service:
    annotations: {}
    labels: {}
    clusterIP: ""

    ## Enabling peer mesh service end points for enabling the HA alert manager
    ## Ref: https://github.com/prometheus/alertmanager/blob/master/README.md
    # enableMeshPeer : true
    
    ## List of IP addresses at which the alertmanager service is available
    ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
    ##
    externalIPs: []

    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    servicePort: 80
    nodePort: 30090
    sessionAffinity: None
    type: NodePort

Prometheus server，更改ClusterIP为NodePort，并新增nodeport字段。在1120行左右
  service:
    ## If false, no Service will be created for the Prometheus server
    ##
    enabled: true

    annotations: {}
    labels: {}
    clusterIP: ""
    
    ## List of IP addresses at which the Prometheus server service is available
    ## Ref: https://kubernetes.io/docs/user-guide/services/#external-ips
    ##
    externalIPs: []

    loadBalancerIP: ""
    loadBalancerSourceRanges: []
    servicePort: 80
    nodePort: 30091
    sessionAffinity: None
    type: NodePort

三、部署prometheus、grafana

[root@master01 prometheus]# kubectl create ns prometheus     # 创建命名空间
[root@master01 prometheus]# helm install prometheus -n prometheus .  # 确保是在工作目录:~/workspace/prometheus/prometheus，helm部署
部署完查看service，将会在grafana中配置数据源时用到
[root@master01 prometheus]# kubectl get svc -n prometheus
NAME                               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)          AGE
prometheus-alertmanager            NodePort    10.1.134.69   <none>        80:30529/TCP   103s
prometheus-alertmanager-headless   ClusterIP   None          <none>        80/TCP         103s
prometheus-kube-state-metrics      ClusterIP   10.1.44.240   <none>        8080/TCP       103s
prometheus-node-exporter           ClusterIP   10.1.102.38   <none>        9100/TCP       103s
prometheus-pushgateway             ClusterIP   10.1.66.116   <none>        9091/TCP       103s
prometheus-server                  NodePort    10.1.40.73    <none>        80:30091/TCP   103s

访问alertmanager-dashboard：:30529
访问server-dashboard：:30091

安装Grafana
同样安装在prometheus空间下

创建Secret
在prometheus命名空间下新建secret，帐号密码：admin / grafana

[root@master01 ]# cd ~/workspace/prometheus/grafana
[root@master01 grafana]# echo -n "admin" | base64
echo -n "grafana" | base64

[root@master01 grafana]# cat > secret.yaml  <<EOF
apiVersion: v1
kind: Secret
metadata:
  name: grafana
  namespace: prometheus
type: Opaque
data:
  admin-user: YWRtaW4=
  admin-password: Z3JhZmFuYQ==
EOF

[root@master01 grafana]# kubectl apply -f secret.yaml

chart包参数设置
进入工作目录，按需修改镜像，持久化存储，副本数等配置；
建议首次部署时直接修改values中的配置，而不是用–set的方式，这样后期upgrade不必重复设置。

[root@master01 grafana]# vim values.yaml

设置密码

# Administrator credentials when not using an existing secret (see below)
adminUser: admin
# adminPassword: strongpassword

# Use an existing secret for the admin user.
admin:
  ## Name of the secret. Can be templated.
  existingSecret: "grafana"     # 即之前创建的secret
  userKey: admin-user 
  passwordKey: admin-password

设置持久化存储
若无需持久化，将enabled设置为false
若使用文件存储，需将accessMode改为ReadWriteMany

## Enable persistence using Persistent Volume Claims
## ref: http://kubernetes.io/docs/user-guide/persistent-volumes/
##
persistence:
  type: pvc
  enabled: false
  # storageClassName: default
  accessModes:
    - ReadWriteOnce
  size: 2Gi
  # annotations: {}
  finalizers:
    - kubernetes.io/pvc-protection

设置NodePort
更改ClusterIP为NodePort，并新增nodeport字段
## Expose the grafana service to be accessed from outside the cluster (LoadBalancer service).
## or access it from within the cluster (ClusterIP service). Set the service type and the port to serve it.
## ref: http://kubernetes.io/docs/user-guide/services/
##
service:
  enabled: true
  type: NodePort
  nodePort: 30092
  port: 80
  targetPort: 3000
    # targetPort: 4181 To be used with a proxy extraContainer
  ## Service annotations. Can be templated.
  annotations: {}
  labels: {}
  portName: service

四、设置grafana

[root@master01 grafana]# helm install grafana -n prometheus .

配置dashboard
登录grafana
访问grafana-dashboard：:30092

帐号密码（之前自定义的secret）： admin /grafana

配置Data sources
首先，获取prometheus的service地址

[root@master01 grafana]# kubectl get svc -n prometheus   # 查询svc
NAME                               TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)        AGE
prometheus-server                  NodePort    10.1.40.73     <none>        80:30091/TCP   15m

进入Data sources配置页面

添加Prometheus，URL填入prometheus的service的ip
在这里插入图片描述

导入dashboard模版
Data sources配置完成后，导入模版
image-20210805203008361
导入模版：1 Node Exporter for Prometheus Dashboard CN v20191102（12377）
更多模版请参考官网网站：https://grafana.com/grafana/dashboards
在这里插入图片描述

数据源选择Prometheus，然后点击import
在这里插入图片描述

最终效果：

参考

K8S/Kubernetes

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐

【深度】阿里巴巴万级规模 K8s 集群全局高可用体系之美

作者 | 韩堂、柘远、沉醉来源 | 阿里巴巴云原生公众号前言台湾作家林清玄在接受记者采访的时候，如此评价自己 30 多年写作生涯：“第一个十年我才华横溢，‘贼光闪现’，令周边黯然失色；第二个十年，我终于‘宝光现形’，不再去抢风头，反而与身边的美丽相得益彰；进入第三个十年，繁华落尽见真醇，我进入了‘醇光初现’的阶段，真正体味到了境界之美”。长夜有穷，真水无香。领略过了 K8s“身在江

K8S/Kubernetes

如何基于 K8s 构建下一代 DevOps 平台？

作者 | 孙健波（天元）导读：当前云原生 DevOps 体系现状如何？面临哪些挑战？如何通过 OAM 解决云原生 DevOps 场景下的诸多问题？云原生开发应用模型 OAM(Open Application Model) 社区核心成员孙健波将为大家一一解答，并分享如何基于 OAM 和 Kubernetes 打造无限能力的下一代 DevOps 平台。什么是 DevOps？为什么基于 Kub