prometheus配置alertmanager告警-钉钉告警
Prometheus配置Alertmanager告警-钉钉告警创建用户和用户组本次安装使用我们自建的 prometheus 用户启动服务,用户和用户组的创建不在赘述。使用二进制包部署AlertManagerAlertmanager最新版本的下载地址可以从Prometheus官方网站https://prometheus.io/download/获取tar xvf alertmanager-0.23
Prometheus配置Alertmanager告警-钉钉告警
创建用户和用户组
本次安装使用我们自建的 prometheus 用户启动服务,用户和用户组的创建不在赘述。
使用二进制包部署AlertManager
Alertmanager最新版本的下载地址可以从Prometheus官方网站https://prometheus.io/download/获取
tar xvf alertmanager-0.23.0.linux-amd64.tar.gz -C /soft
cd /soft
mv alertmanager-0.23.0.linux-amd64 alertmanager
cd alertmanager
mkdir data
chown -R prometheus.prometheus /soft/alertmanager
创建alertmanager配置文件
Alertmanager解压后会包含一个默认的alertmanager.yml配置文件,对该文件进行配置。
#全局配置
global:
resolve_timeout: 5m #处理超时时间,默认为5min
smtp_smarthost: '****.com:25' # 邮箱smtp服务器代理
smtp_from: ****.com' # 发送邮箱名称
smtp_auth_username: '****.com' # 邮箱名称
smtp_auth_password: '****' # 邮箱密码或授权码
smtp_require_tls: false
# 定义模板信息
templates:
- 'alarm_template/*.tmpl'
#路由配置
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1m
receiver: 'email' #所有报警均发邮件
routes: #路由正则
- match_re:
severity: warning #如果是warning告警,接收者是webhook1
receiver: webhook1
- match_re:
severity: error #如果是error告警,接收者是webhook2
receiver: webhook2
#接收人信息
receivers:
- name: 'email' # 警报
email_configs: # 邮箱配置
- to: ****@qq.com' # 接收警报的email配置
- name: 'webhook1'
webhook_configs:
- &dingtalk_config
send_resolved: false
url: http://localhost:8060/dingtalk/webhook2/send #钉钉路径,后面会说到
- name: 'webhook2'
webhook_configs:
- <<: *dingtalk_config
send_resolved: true
url: http://localhost:8060/dingtalk/webhook_mention_users/send
# 一个inhibition规则是在与另一组匹配器匹配的警报存在的条件下,使匹配一组匹配器的警报失效的规则。两个警报必须具有一组相同的标签
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
检查配置文件配置是否正确
很重要,关乎你的程序能否正常启动。
cd /soft/alertmanager
./amtool check-config ./alertmanager.yml
创建 Alertmanager系统服务启动文件
vim /usr/lib/systemd/system/alertmanager.service
[Unit]
Description=alertmanager
Documentation=https://prometheus.io/
After=network.target
[Service]
User=prometheus
Group=prometheus
ExecStart=/soft/alertmanager/alertmanager --config.file=/soft/alertmanager/alertmanager.yml --storage.path=/soft/alertmanager/data
Restart=on-failure
[Install]
WantedBy=multi-user.target
启动服务
服务启动后,可以在http://localhost:9093 查看ui界面。
systemctl daemon-reload
systemctl enable alertmanager.service
systemctl start alertmanager.service
systemctl status alertmanager.service
systemctl restart alertmanager.service #重启服务使用
关联Prometheus与Alertmanager
需要在Prometheus的安装目录下的prometheus.yml文件中配置与Alertmanager通信的地址和端口号。
alerting:
alertmanagers:
- static_configs:
- targets: ['localhost:9093'] #需要根据实际情况配置
rule_files:
- "rule/*.yml" 自定义规则存储目录
重启Prometheus服务
重启过程详见《使用Grafana监控Doris》手册。
重启成功后,可以从http://localhost:9090/config查看alerting配置是否生效。
自定义Prometheus告警规则
在 Prometheus 目录下新建一个目录rule,用来存放自定义的规则文件。配置规则时,可以在http://localhost:9090/graph中测试告警条件是否正确。
groups:
- name: general #规则组名(该组下可以配置多个规则,多个规则合并报警,减少报警量)
rules:
- alert: InstanceDown # 告警名称
expr: up == 0 #告警条件
for: 15s # 满足告警条件持续时间多久后,才会发送告警
labels: #标签项
severity: error #告警级别
annotations: # 解析项,详细解释告警信息
summary: "Instance {{ $labels.instance }} down"
description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 15 second."
检查配置文件及规则配置是否正确。很重要,关乎你的程序能否正常启动。
cd /soft/prometheus
./promtool check config prometheus.yml
Prometheus热加载配置
热加载配置后,就可以查看规则是否报警了,可以在http://localhost:9093 页面查看报警规则。红色代表已经触发的报警,绿色表示暂未触发条件。
systemctl reload prometheus.service
下载钉钉插件
因为Prometheus 的Alertmanager自身不支持钉钉报警,需要通过插件的方式来达到报警条件。
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v1.4.0/prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz
tar -xvf prometheus-webhook-dingtalk-1.4.0.linux-amd64.tar.gz -C /soft
mv prometheus-webhook-dingtalk-1.4.0.linux-amd64 prometheus-webhook-dingtalk
配置报警模板文件
{{- define "wechat.tmpl" }}
{{- range $i, $alert := .Alerts.Firing -}}
[报警项]:{{ index $alert.Labels "alertname" }}
[实例]:{{ index $alert.Labels "instance" }}
[job]:{{ index $alert.Labels "job" }}
[报警内容]:{{ index $alert.Annotations "summary" }}
[开始时间]:{{ $alert.StartsAt.Format "2006-01-02 15:04:05" }}
====================
{{- end }}
{{- end }}
修改配置文件
cp config.example.yml config.yml
## Request timeout
# timeout: 5s
## Customizable templates path 自定义模板位置
templates:
- /soft/alertmanager/alarm_template/webhook.tmpl
## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
# default_message:
# title: '{{ template "legacy.title" . }}'
# text: '{{ template "legacy.content" . }}'
## Targets, previously was known as "profiles"
targets:
webhook1: #加签的钉钉
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
# secret for signature
secret: SEC000000000000000000000
webhook2: #不加签钉钉
url: https://oapi.dingtalk.com/robot/send?access_token=
webhook_legacy:
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
# Customize template content
message:
# Use legacy template
title: '{{ template "legacy.title" . }}'
text: '{{ template "legacy.content" . }}'
webhook_mention_all: #@所有人钉钉
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
mention:
all: true
webhook_mention_users: #@指定用户钉钉
url: https://oapi.dingtalk.com/robot/send?access_token=
mention:
mobiles: ['152***', '134***']
创建 webhook-dingtalk系统服务启动文件
vim /usr/lib/systemd/system/webhook-dingtalk.service
[Unit]
Description=prometheus-webhook-dingtalk
Documentation=https://github.com/timonwong/prometheus-webhook-dingtalk
After=network.target
[Service]
User=prometheus
Group=prometheus
ExecStart=/soft/prometheus-webhook-dingtalk/prometheus-webhook-dingtalk --config.file=/soft/prometheus-webhook-dingtalk/config.yml
Restart=on-failure
[Install]
WantedBy=multi-user.target
启动服务
systemctl daemon-reload
systemctl enable webhook-dingtalk.service
systemctl start webhook-dingtalk.service
systemctl status webhook-dingtalk.service
systemctl restart webhook-dingtalk.service
最后,你不仅可以从钉钉上获得报警信息,同时也可以在邮件中获取,还可以在监控页面上获取信息。
1.从localhost:9093页面上监控到的报警:
2.钉钉报警信息:
3.邮箱报警信息:
好了,开始飞起来吧!!!!!!
更多推荐
所有评论(0)