通过helm的方式安装kafka集群(有持久化),kafka并无官方helm,我们使用binata的版本。

helm链接:kafka 29.2.0 · bitnami/bitnami

打开链接install中可以查看安装命令,default values中有默认值。

1.添加helm仓库地址,myrepo是本地helm仓库名称,可以自行定义

helm repo add myrepo  https://charts.bitnami.com/bitnami

2.查看helm仓库列表

helm repo list

3.更新helm仓库

helm repo update myrepo

4.查看kafka的Chart包的历史版本

helm search repo bitnami/kafka -l

 5.手动在helm链接中的Default Values下载对应版本的默认值,然后本地修改后上传服务器(或者下载chart包修改默认value值)

这是我修改的一些默认值

(1)客户端需不需要认证的配置在这里,因为我是全内网使用为了简单,配置了PLAINTEXT,大家可以自行决定。

(2)控制器Controller监听认证方式配置,为了简单,也配置了PLAINTEXT

(3)专门用于Kafka集群中Broker之间的通信的监听器,为了简单,也配置了PLAINTEXT

(4)外部监听器,为了简单,也配置了PLAINTEXT

(5)配置kafka数据持久化,需要提前在K8S集群中创建一个存储类并设定size

(6)开启日志持久化、指定存储类及size

(7)开启jmx exporter用于监控kafka

(8)values文件修改完成后上传到可以执行kubeconfig及helm的服务器上,执行命令安装kafka

helm install kafka-cluster myrepo/kafka --version 29.2.0 -f helm-29.2.0-kafka-3.7.0.yaml --kubeconfig=/var/lib/jenkins/.kube/kubeconfig -n kafka-cluster

(9)部署webui方便查看

github地址:GitHub - provectus/kafka-ui: Open-Source Web UI for Apache Kafka Management

K8S集群kafka-cluster命名空间下创建一个无状态应用kafka-ui

镜像地址:provectuslabs/kafka-ui:latest

yaml示例:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: kafka-ui
  labels:
    app: kafka-ui
  namespace: kafka    
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kafka-ui
  template:
    metadata:
      labels:
        app: kafka-ui
    spec:
      containers:
      - name: kafka-ui
        image: provectuslabs/kafka-ui:latest
        env:
        - name: KAFKA_CLUSTERS_0_NAME
          value: 'Kafka Cluster'
        - name: KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS
          value: 'kafka-controller-0.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-1.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-2.kafka-controller-headless.kafka.svc.cluster.local:9092'
        - name: KAFKA_CLUSTERS_0_PROPERTIES_SECURITY_PROTOCOL
          value: 'PLAINTEXT'
        - name: AUTH_TYPE
          value: 'LOGIN_FORM'
        - name: SPRING_SECURITY_USER_NAME
          value: 'devops'
        - name: SPRING_SECURITY_USER_PASSWORD
          value: 'mfniqJkDk'
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
        ports:
        - containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
  name: kafka-ui
  namespace: kafka     
spec:
  selector:
    app: kafka-ui
  type: NodePort
  ports:
    - protocol: TCP
      port: 8080
      targetPort: 8080

(10)解析一个域名到kafka-ui,这样就能通过web界面查看kafka的相关信息了。

(11)kafka安装好了之后,下面进行监控和告警。--(K8S集群helm安装Prometheus、alertmanage、Grafana大家自行搜索文档,这里不赘述)

进入监控部署的命名空间,找到保密字典中的这一项,编辑添加kafka-exporter的job。(前面helm安装的是jmx_exporter,我后边自行安装了kafka-exporter,大家根据自行需要安装,两个exporter可以同时存在)

- job_name: kafka_cluster_exporter
  metrics_path: /metrics
  static_configs:
  - targets:
    - 172.22.6.6:9308(这里改成自己的kafka-exporter服务地址和端口)

至此prometheus就已经开始通过kafa-exporter收集监控指标了。

(12)配置kafka告警项目并通过钉钉告警,为了减少告警项目,我打了自定义标签,并指定只有kafka的告警通过钉钉来发送,避免K8S集群告警大量告警信息的袭扰。

(1)首先在钉钉群里配置钉钉机器人获取到token及secret

alertmanager 的 receive 并不直接支持钉钉的 url,要部署插件容器 prometheus-webhook-dingtalk

并且有个需要注意的地方是,当 receives 为钉钉时 (webhook_configs),它的告警模板不是在 alertmanager 的配置文件中指定的,而是在钉钉插件 prometheus-webhook-dingtalk 中指定的。

编写 prometheus-webhook-dingtalk 配置文件和模板

vim prometheus-webhook-dingtalk-config.yaml,这里记的替换你的钉钉 url token。

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-webhook-dingtalk-config
  namespace: monitoring
data:
  config.yml: |-
    templates:
      - /etc/prometheus-webhook-dingtalk/default.tmpl
    targets:
      webhook1:
        url: https://oapi.dingtalk.com/robot/send?access_token=1f315a3d3b68ae9a5df0f6cde411902c493a10bc3d6ed6bbba8cd8b4bcd1c848
        secret: SEC4d160d1d987b58a19e9a825b83715b253d0b6d0c255b5abb28c265798c535b7e
        message:
          text: '{{ template "default.tmpl" . }}'
 
  default.tmpl: |
    {{ define "default.tmpl" }}
 
    {{- if gt (len .Alerts.Firing) 0 -}}
    {{- range $index, $alert := .Alerts -}}
 
    ============ = **<font color='#FF0000'>告警</font>** = =============  
  
    **告警名称:**    {{ $alert.Labels.alertname }}   
    **告警级别:**    {{ $alert.Labels.severity }} 级   
    **告警状态:**    {{ .Status }}   
    **告警实例:**    {{ $alert.Labels.instance }} {{ $alert.Labels.device }}   
    **告警概要:**    {{ .Annotations.summary }}   
    **告警详情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}   
    **故障时间:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}  
    ============ = end = =============  
    {{- end }}
    {{- end }}
 
    {{- if gt (len .Alerts.Resolved) 0 -}}
    {{- range $index, $alert := .Alerts -}}
 
    ============ = <font color='#00FF00'>恢复</font> = =============   
 
    **告警实例:**    {{ .Labels.instance }}   
    **告警名称:**    {{ .Labels.alertname }}  
    **告警级别:**    {{ $alert.Labels.severity }} 级   
    **告警状态:**    {{   .Status }} 
    **告警概要:**    {{ $alert.Annotations.summary }}  
    **告警详情:**    {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}  
    **故障时间:**    {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}  
    **恢复时间:**    {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}  
 
    ============ = **end** = =============
    {{- end }}
    {{- end }}
    {{- end }}

(2)部署Prometheus-dingtalk-webhook服务,如果你helm安装prometheus的时候装过了,那就直接把上面的配置文件挂载进去就好了。

vim dingtalk-webhook-deploy.yaml

apiVersion: v1
kind: Service
metadata:
  name: dingtalk
  namespace: monitoring
  labels:
    app: dingtalk
spec:
  selector:
    app: dingtalk
  ports:
  - name: dingtalk
    port: 8060
    protocol: TCP
    targetPort: 8060
  
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: dingtalk
  namespace: monitoring
spec:
  replicas: 1
  selector:
    matchLabels:
      app: dingtalk
  template:
    metadata:
      name: dingtalk
      labels:
        app: dingtalk
    spec:
      containers:
      - name: dingtalk
        image: timonwong/prometheus-webhook-dingtalk:v2.1.0
        imagePullPolicy: IfNotPresent
        ports:
        - containerPort: 8060
        volumeMounts:
        - name: config
          mountPath: /etc/prometheus-webhook-dingtalk
      volumes:
      - name: config
        configMap:
          name: prometheus-webhook-dingtalk-config

kubectl -n monitoring  apply -f dingtalk-webhook-deploy.yaml

(3)通过yaml创建自定义报警规则文件资源。

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  annotations:
    prometheus-operator-validated: 'true'
  creationTimestamp: '2024-06-07T06:44:49Z'
  generation: 12
  labels:
    app: ack-prometheus-operator
    release: ack-prometheus-operator
  managedFields:
    - apiVersion: monitoring.coreos.com/v1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:labels':
            .: {}
            'f:app': {}
            'f:release': {}
        'f:spec':
          .: {}
          'f:groups': {}
      manager: okhttp
      operation: Update
      time: '2024-06-27T08:45:55Z'
  name: ack-prometheus-operator-kafka.rules
  namespace: monitoring
  resourceVersion: '263680376'
  uid: 4f574388-b8e7-493e-80cb-be9f73a14c5f
spec:
  groups:
    - name: kafka-cluster-exporter
      rules:
        - alert: KafkaClusterExporterDown
          annotations:
            description: Kafka Cluster  Exporter停止运行1分钟.
            summary: Kafka Cluster  Exporter已经停止
          expr: 'up{job="kafka_cluster_exporter"} == 0'
          for: 1m
          labels:
            product: kafka-cluster
            severity: critical
            status: 严重
    - name: kafka消费滞后告警
      rules:
        - alert: kafka消费滞后
          annotations:
            description: >-
              {{$.Labels.consumergroup}}##{{$.Labels.topic}}:消费滞后超过500持续3分钟(当前:{{$value}})
            summary: kafka消费滞后
          expr: >-
            sum(kafka_consumergroup_lag{topic!="sop_free_study_fix-student_wechat_detail"})
            by (consumergroup, topic) > 500
          for: 3m
          labels:
            product: kafka-cluster
            serverity: warning
            status: 严重
        - alert: jshop cluster kafka  down
          annotations:
            description: 'kafka-cluster-broker down }'
            summary: jshop-cluster-broker数量小于3
          expr: 'kafka_brokers{job="kafka_cluster_exporter"} < 3'
          for: 1m
          labels:
            product: kafka-cluster
            serverity: warning
            status: 严重

kubectl   -n monitoring   create -f prometheus-kafka.yaml,这个yaml中告警我们添加了自定义标签product,值为kafka-cluster,方便alertmanager到时候过滤。

  (4)配置alertmanager规则。

注意下面webhook_configs的地址改为自己部署的prometheus-webhook-dingtalk的服务地址。

global:
  resolve_timeout: 5m
receivers:
  - name: 'null'
  - name: 'dingtalk'
    webhook_configs:
      - url: 'http://172.22.7.34:8060/dingtalk/webhook1/send'
        send_resolved: true
route:
  group_by:
  - alertname
  group_interval: 5m
  group_wait: 30s
  receiver: "null"
  repeat_interval: 1h
  routes:
  - receiver: "dingtalk"
    match:
      product: 'kafka-cluster'
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'kafka', 'instance']

篇幅过长,时间有限,有不正确之处,请指正。如有部署过程中的问题,请留言可以一起讨论。 

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐