K8S集群安装kafka集群(持久化数据)并构建webui
5.手动在helm链接中的Default Values下载对应版本的默认值,然后本地修改后上传服务器(或者下载chart包修改默认value值)(1)客户端需不需要认证的配置在这里,因为我是全内网使用为了简单,配置了PLAINTEXT,大家可以自行决定。通过helm的方式安装kafka集群(有持久化),kafka并无官方helm,我们使用binata的版本。(3)专门用于Kafka集群中Broke
通过helm的方式安装kafka集群(有持久化),kafka并无官方helm,我们使用binata的版本。
helm链接:kafka 29.2.0 · bitnami/bitnami
打开链接install中可以查看安装命令,default values中有默认值。
1.添加helm仓库地址,myrepo是本地helm仓库名称,可以自行定义
helm repo add myrepo https://charts.bitnami.com/bitnami
2.查看helm仓库列表
helm repo list
3.更新helm仓库
helm repo update myrepo
4.查看kafka的Chart包的历史版本
helm search repo bitnami/kafka -l
5.手动在helm链接中的Default Values下载对应版本的默认值,然后本地修改后上传服务器(或者下载chart包修改默认value值)
这是我修改的一些默认值
(1)客户端需不需要认证的配置在这里,因为我是全内网使用为了简单,配置了PLAINTEXT,大家可以自行决定。
(2)控制器Controller监听认证方式配置,为了简单,也配置了PLAINTEXT
(3)专门用于Kafka集群中Broker之间的通信的监听器,为了简单,也配置了PLAINTEXT
(4)外部监听器,为了简单,也配置了PLAINTEXT
(5)配置kafka数据持久化,需要提前在K8S集群中创建一个存储类并设定size
(6)开启日志持久化、指定存储类及size
(7)开启jmx exporter用于监控kafka
(8)values文件修改完成后上传到可以执行kubeconfig及helm的服务器上,执行命令安装kafka
helm install kafka-cluster myrepo/kafka --version 29.2.0 -f helm-29.2.0-kafka-3.7.0.yaml --kubeconfig=/var/lib/jenkins/.kube/kubeconfig -n kafka-cluster
(9)部署webui方便查看
github地址:GitHub - provectus/kafka-ui: Open-Source Web UI for Apache Kafka Management
K8S集群kafka-cluster命名空间下创建一个无状态应用kafka-ui
镜像地址:provectuslabs/kafka-ui:latest
yaml示例:
apiVersion: apps/v1
kind: Deployment
metadata:
name: kafka-ui
labels:
app: kafka-ui
namespace: kafka
spec:
replicas: 1
selector:
matchLabels:
app: kafka-ui
template:
metadata:
labels:
app: kafka-ui
spec:
containers:
- name: kafka-ui
image: provectuslabs/kafka-ui:latest
env:
- name: KAFKA_CLUSTERS_0_NAME
value: 'Kafka Cluster'
- name: KAFKA_CLUSTERS_0_BOOTSTRAPSERVERS
value: 'kafka-controller-0.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-1.kafka-controller-headless.kafka.svc.cluster.local:9092,kafka-controller-2.kafka-controller-headless.kafka.svc.cluster.local:9092'
- name: KAFKA_CLUSTERS_0_PROPERTIES_SECURITY_PROTOCOL
value: 'PLAINTEXT'
- name: AUTH_TYPE
value: 'LOGIN_FORM'
- name: SPRING_SECURITY_USER_NAME
value: 'devops'
- name: SPRING_SECURITY_USER_PASSWORD
value: 'mfniqJkDk'
resources:
requests:
memory: "256Mi"
cpu: "100m"
ports:
- containerPort: 8080
---
apiVersion: v1
kind: Service
metadata:
name: kafka-ui
namespace: kafka
spec:
selector:
app: kafka-ui
type: NodePort
ports:
- protocol: TCP
port: 8080
targetPort: 8080
(10)解析一个域名到kafka-ui,这样就能通过web界面查看kafka的相关信息了。
(11)kafka安装好了之后,下面进行监控和告警。--(K8S集群helm安装Prometheus、alertmanage、Grafana大家自行搜索文档,这里不赘述)
进入监控部署的命名空间,找到保密字典中的这一项,编辑添加kafka-exporter的job。(前面helm安装的是jmx_exporter,我后边自行安装了kafka-exporter,大家根据自行需要安装,两个exporter可以同时存在)
- job_name: kafka_cluster_exporter
metrics_path: /metrics
static_configs:
- targets:
- 172.22.6.6:9308(这里改成自己的kafka-exporter服务地址和端口)
至此prometheus就已经开始通过kafa-exporter收集监控指标了。
(12)配置kafka告警项目并通过钉钉告警,为了减少告警项目,我打了自定义标签,并指定只有kafka的告警通过钉钉来发送,避免K8S集群告警大量告警信息的袭扰。
(1)首先在钉钉群里配置钉钉机器人获取到token及secret
alertmanager 的 receive 并不直接支持钉钉的 url,要部署插件容器 prometheus-webhook-dingtalk
并且有个需要注意的地方是,当 receives 为钉钉时 (webhook_configs),它的告警模板不是在 alertmanager 的配置文件中指定的,而是在钉钉插件 prometheus-webhook-dingtalk 中指定的。
编写 prometheus-webhook-dingtalk 配置文件和模板
vim prometheus-webhook-dingtalk-config.yaml,这里记的替换你的钉钉 url token。
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-webhook-dingtalk-config
namespace: monitoring
data:
config.yml: |-
templates:
- /etc/prometheus-webhook-dingtalk/default.tmpl
targets:
webhook1:
url: https://oapi.dingtalk.com/robot/send?access_token=1f315a3d3b68ae9a5df0f6cde411902c493a10bc3d6ed6bbba8cd8b4bcd1c848
secret: SEC4d160d1d987b58a19e9a825b83715b253d0b6d0c255b5abb28c265798c535b7e
message:
text: '{{ template "default.tmpl" . }}'
default.tmpl: |
{{ define "default.tmpl" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
============ = **<font color='#FF0000'>告警</font>** = =============
**告警名称:** {{ $alert.Labels.alertname }}
**告警级别:** {{ $alert.Labels.severity }} 级
**告警状态:** {{ .Status }}
**告警实例:** {{ $alert.Labels.instance }} {{ $alert.Labels.device }}
**告警概要:** {{ .Annotations.summary }}
**告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
**故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
============ = end = =============
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
============ = <font color='#00FF00'>恢复</font> = =============
**告警实例:** {{ .Labels.instance }}
**告警名称:** {{ .Labels.alertname }}
**告警级别:** {{ $alert.Labels.severity }} 级
**告警状态:** {{ .Status }}
**告警概要:** {{ $alert.Annotations.summary }}
**告警详情:** {{ $alert.Annotations.message }}{{ $alert.Annotations.description}}
**故障时间:** {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
**恢复时间:** {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
============ = **end** = =============
{{- end }}
{{- end }}
{{- end }}
(2)部署Prometheus-dingtalk-webhook服务,如果你helm安装prometheus的时候装过了,那就直接把上面的配置文件挂载进去就好了。
vim dingtalk-webhook-deploy.yaml
apiVersion: v1
kind: Service
metadata:
name: dingtalk
namespace: monitoring
labels:
app: dingtalk
spec:
selector:
app: dingtalk
ports:
- name: dingtalk
port: 8060
protocol: TCP
targetPort: 8060
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: dingtalk
namespace: monitoring
spec:
replicas: 1
selector:
matchLabels:
app: dingtalk
template:
metadata:
name: dingtalk
labels:
app: dingtalk
spec:
containers:
- name: dingtalk
image: timonwong/prometheus-webhook-dingtalk:v2.1.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 8060
volumeMounts:
- name: config
mountPath: /etc/prometheus-webhook-dingtalk
volumes:
- name: config
configMap:
name: prometheus-webhook-dingtalk-config
kubectl -n monitoring apply -f dingtalk-webhook-deploy.yaml
(3)通过yaml创建自定义报警规则文件资源。
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
annotations:
prometheus-operator-validated: 'true'
creationTimestamp: '2024-06-07T06:44:49Z'
generation: 12
labels:
app: ack-prometheus-operator
release: ack-prometheus-operator
managedFields:
- apiVersion: monitoring.coreos.com/v1
fieldsType: FieldsV1
fieldsV1:
'f:metadata':
'f:labels':
.: {}
'f:app': {}
'f:release': {}
'f:spec':
.: {}
'f:groups': {}
manager: okhttp
operation: Update
time: '2024-06-27T08:45:55Z'
name: ack-prometheus-operator-kafka.rules
namespace: monitoring
resourceVersion: '263680376'
uid: 4f574388-b8e7-493e-80cb-be9f73a14c5f
spec:
groups:
- name: kafka-cluster-exporter
rules:
- alert: KafkaClusterExporterDown
annotations:
description: Kafka Cluster Exporter停止运行1分钟.
summary: Kafka Cluster Exporter已经停止
expr: 'up{job="kafka_cluster_exporter"} == 0'
for: 1m
labels:
product: kafka-cluster
severity: critical
status: 严重
- name: kafka消费滞后告警
rules:
- alert: kafka消费滞后
annotations:
description: >-
{{$.Labels.consumergroup}}##{{$.Labels.topic}}:消费滞后超过500持续3分钟(当前:{{$value}})
summary: kafka消费滞后
expr: >-
sum(kafka_consumergroup_lag{topic!="sop_free_study_fix-student_wechat_detail"})
by (consumergroup, topic) > 500
for: 3m
labels:
product: kafka-cluster
serverity: warning
status: 严重
- alert: jshop cluster kafka down
annotations:
description: 'kafka-cluster-broker down }'
summary: jshop-cluster-broker数量小于3
expr: 'kafka_brokers{job="kafka_cluster_exporter"} < 3'
for: 1m
labels:
product: kafka-cluster
serverity: warning
status: 严重
kubectl -n monitoring create -f prometheus-kafka.yaml,这个yaml中告警我们添加了自定义标签product,值为kafka-cluster,方便alertmanager到时候过滤。
(4)配置alertmanager规则。
注意下面webhook_configs的地址改为自己部署的prometheus-webhook-dingtalk的服务地址。
global:
resolve_timeout: 5m
receivers:
- name: 'null'
- name: 'dingtalk'
webhook_configs:
- url: 'http://172.22.7.34:8060/dingtalk/webhook1/send'
send_resolved: true
route:
group_by:
- alertname
group_interval: 5m
group_wait: 30s
receiver: "null"
repeat_interval: 1h
routes:
- receiver: "dingtalk"
match:
product: 'kafka-cluster'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'kafka', 'instance']
篇幅过长,时间有限,有不正确之处,请指正。如有部署过程中的问题,请留言可以一起讨论。
更多推荐
所有评论(0)