导航

进阶之路:从零到一在k8s上部署高可用prometheus —— 总览
进阶之路:从零到一在k8s上部署高可用prometheus —— 准备工作
进阶之路:从零到一在k8s上部署高可用prometheus —— exporter
进阶之路:从零到一在k8s上部署高可用prometheus —— consul
进阶之路:从零到一在k8s上部署高可用prometheus —— prometheus-operator
进阶之路:从零到一在k8s上部署高可用prometheus —— prometheus
进阶之路:从零到一在k8s上部署高可用prometheus —— alertmanager
进阶之路:从零到一在k8s上部署高可用prometheus —— minio
进阶之路:从零到一在k8s上部署高可用prometheus —— thanos receive、thanos query

前言

在进行了一系列准备工作后,终于可以开始部署整个架构的基础 —— prometheus了,主要的内容有三块:prometheus本体抓取配置以及告警配置。本章主要是通过例子简单演示了相关的文件结构以及用法,更多自定义需求请参考参数文档

技术支持

有的小伙伴在按照文章操作时遇到了各种各样奇奇怪怪的问题,私信上解决问题的效率又比较低。大家可以试试用AI解决自己的问题,既学到了新东西又解决了问题美滋滋~目前我用下来最好用的还是GPT,3.5模型足以解决各类编程问题了,但是使用有一定门槛(魔法上网+海外支付)。如果没有条件的小伙伴可以试试这个,送的100次对话足够解决问题了。

相关yaml文件

权限相关

prometheus-serviceAccount.yaml

apiVersion: v1
kind: ServiceAccount
metadata:
  name: prometheus-ifcloud

prometheus-clusterRole.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-ifcloud-cluster-role
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get

prometheus-clusterRoleBinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: prometheus-ifcloud-cluster-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: prometheus-ifcloud-cluster-role
subjects:
- kind: ServiceAccount
  name: prometheus-ifcloud
  namespace: prom-ha

prometheus-role.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: prometheus-k8s-role
rules:
- apiGroups:
  - ""
  resources:
  - configmaps
  verbs:
  - get

prometheus-roleBinding.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: prometheus-ifcloud-role-binding
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: prometheus-ifcloud-role
subjects:
- kind: ServiceAccount
  name: prometheus-ifcloud
  namespace: prom-ha

配置相关

additional-scrape-configs.yaml

# 抓取配置
kind: Secret
apiVersion: v1
metadata:
  name: additional-scrape-configs
stringData:
  prometheus-additional.yaml: >-
    - job_name: 'ifcloud-alert'
      consul_sd_configs:
        - server: consul:8500 #consul的host,此处利用service找到之前部署的consul
      relabel_configs:
        - source_labels: [__meta_consul_tags] #只保留tag中包含'ifcloud'的target
          regex: .*ifcloud.*
          action: keep
        - regex: __meta_consul_service_metadata_(.+) #将名称为__meta_consul_service_metadata_xxx的label重命名为xxx
          action: labelmap
type: Opaque

prometheus-rules.yaml

# 告警配置
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule #prometheus-operator提供的crd
metadata:
  labels: #prometheus根据labels筛选对应的PrometheusRule
    prometheus: ifcloud
    role: ifcloud-rules
  name: ifcloud-rules
spec:
  groups:
  - name: node-exporter.rules
    rules:
    - alert: cpu > 0 #告警名称
      annotations:
        description: idel node_cpu_seconds_total > 0
        summary: CPU0 的idel node_cpu_seconds_total 大于 0
      for: 1m #当监控项持续1分钟满足条件(expr)时,触发告警
      labels:
        severity: critical
      expr: node_cpu_seconds_total{cpu="0",instance="i-6ULChRiM8A",job="ifcloud-alert",mode="idle"} > 0 #触发告警的表达式,这里写了一个必定触发的条件

服务相关

prometheus-service.yaml

apiVersion: v1
kind: Service
metadata:
  labels:
    prometheus: ifcloud
  name: prometheus-ifcloud
spec:
  ports:
  - name: web
    port: 9090
    targetPort: web
  selector:
    app: prometheus
    prometheus: ifcloud
  sessionAffinity: ClientIP

prometheus-service-web.yaml

kind: Service
apiVersion: v1
metadata:
  name: prometheus-ifcloud-web
  labels:
    app: prometheus-ifcloud-web
spec:
  ports:
    - name: http-web
      protocol: TCP
      port: 9090
      targetPort: 9090
      nodePort: 30003
  selector:
    prometheus: ifcloud
    app: prometheus
  type: NodePort
  sessionAffinity: None

prometheus.yaml

apiVersion: monitoring.coreos.com/v1
kind: Prometheus #prometheus-operator提供的crd
metadata:
  labels:
    prometheus: ifcloud
  name: ifcloud
spec:
  additionalScrapeConfigs:
    key: prometheus-additional.yaml #对应抓取配置下某个文件的name
    name: additional-scrape-configs #对应抓取配置的name
  alerting:
    alertmanagers:
    - name: alertmanager-ifcloud #对应alertmanager service的全名
      namespace: prom-ha #对应alertmanager所在的namespace
      port: web
  remoteWrite:
  - name: thanos-reveiver-ifcloud
    url: http://thanos-receiver-ifcloud:19291/api/v1/receive #对应thanos receive的host
  image: quay.io/prometheus/prometheus:v2.22.1
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  probeNamespaceSelector: {}
  probeSelector: {}
  replicas: 3
  resources:
    requests:
      memory: 400Mi
  ruleSelector: #通过labels匹配prometheus-rules.yaml创建的PrometheusRule资源
    matchLabels:
      prometheus: ifcloud
      role: ifcloud-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-ifcloud
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.22.1

部署

# 将以上文件放在目录/yaml/prometheus下
# 执行以下命令验证yaml文件正确性
kubectl apply -f /yaml/prometheus -n prom-ha --dry-run=client

# 验证无误后执行以下命令创建相关k8s资源
kubectl apply -f /yaml/prometheus -n prom-ha

注:如果你使用的命名空间不是prom-ha,需要修改目录下的
prometheus-clusterRoleBinding.yamlprometheus-roleBinding.yamlsubjects.namespace

prometheus.yamlspec.alerting.alertmanagers.namespace

验证

部署完成后访问http://192.168.25.80:30003/targets和http://192.168.25.80:30003/rules,看到以下内容即证明部署成功。
在这里插入图片描述在这里插入图片描述

在http://192.168.25.80:30003/graph页面尝试利用promQL(node_cpu_seconds_total)查询一下node_exporter暴露的数据
在这里插入图片描述

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐