prometheus 监控单机环境安装记录
环境虚拟机: VirtualBox-6.1.14,单核cpu,4G内存虚拟机OS:CentOS Linux release 7.7.1908 (Core)安装软件:prometheus-2.21.0.linux-amd64.tar.gznode_exporter-1.0.1.linux-amd64.tar.gzalertmanager-0.21.0.linux-amd64.tar.gzGrafan
环境
虚拟机: VirtualBox-6.1.14,单核cpu,4G内存
虚拟机OS:CentOS Linux release 7.7.1908 (Core)
安装软件:
prometheus-2.21.0.linux-amd64.tar.gz
node_exporter-1.0.1.linux-amd64.tar.gz
alertmanager-0.21.0.linux-amd64.tar.gz
redis_exporter-v1.12.0.linux-amd64.tar.gz
rabbitmq_exporter-1.0.0-RC7.linux-amd64.tar.gz
grafana-7.2.1-1.x86_64.rpm
操作用户:root
为缩短文档,以下只精简记录必要的部署步骤,都记录了2种启动方式(直接启动和系统启动),想进一步了解原理及各参数含义,可查看【参考】部分的链接文档。
一、裸机安装Prometheus Server
1、下载安装
# curl -OL https://github.com/prometheus/prometheus/releases/download/v2.21.0/prometheus-2.21.0.linux-amd64.tar.gz
# tar -zxvf prometheus-2.21.0.linux-amd64.tar.gz
# mkdir -p /opt/prometheus/prometheus-server
# mkdir /opt/logs/ ##创建日志存储目录
# mv prometheus-2.21.0.linux-amd64/* /opt/prometheus/prometheus-server
# cd /opt/prometheus/prometheus-server/
# mkdir data
2、配置Alert报警规则(简单示例,需要Node Exporter支持)
# mkdir /opt/prometheus/prometheus-server/rules/
# vim /opt/prometheus/prometheus-server/prometheus.yml
prometheus.yml新增以下配置:
rule_files:
- "/opt/prometheus/prometheus-server/rules/*.yml"
在目录/opt/prometheus/prometheus-server/rules/下创建告警文件hoststats-alert.yml
# cat > /opt/prometheus/prometheus-server/rules/hoststats-alert.yml << 'EOF'
groups:
- name: hostStatsAlert
rules:
- alert: hostCpuUsageAlert
expr: sum(avg without (cpu)(irate(node_cpu_seconds_total{mode!='idle'}[5m]))) by (instance) > 0.85
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} CPU usgae high"
description: "{{ $labels.instance }} CPU usage above 85% (current value: {{ $value }})"
- alert: hostMemUsageAlert
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85
for: 1m
labels:
severity: page
annotations:
summary: "Instance {{ $labels.instance }} MEM usgae high"
description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"
EOF
注意cat <<EOF时注意$变量符丢失
其它配置可参考:https://blog.csdn.net/haohaifeng002/article/details/109223574
3、启动服务:
以下介绍2种启动方式,直接启动和服务启动
1)、直接启动
# nohup ./prometheus > /opt/logs/prometheus-9090.log 2>&1 &
可以通过参数–storage.tsdb.path="data/"修改本地数据存储的路径
修改端口可使用以下命令启动:
# nohup ./prometheus --web.listen-address=:9091 > /opt/logs/prometheus-9091.log 2>&1 &
2)、服务启动
a、创建sh脚本
#vim /opt/prometheus/prometheus-server/prometheus.sh
#!/bin/bash
/opt/prometheus/prometheus-server/prometheus --web.enable-lifecycle --config.file=/opt/prometheus/prometheus-server/prometheus.yml --web.listen-address=:9091 &>> /opt/logs/prometheus-9091.log
b、授权
# chmod +x prometheus.sh
c、配置系统启动
# cat > /etc/systemd/system/prometheus.service <<EOF
[Unit]
Description=prometheus
After=network.target
[Service]
ExecStart=/opt/prometheus/prometheus-server/prometheus.sh
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
d、启动服务,设置开机自启,并检查服务开启状态
# systemctl daemon-reload
# systemctl enable prometheus
# systemctl start prometheus
# systemctl status prometheus
tail -f /var/log/messages
4、访问:http://192.168.56.101:9090/graph
下图是安装了Node Exporter 访问Prometheus UI的效果
二、安装Node Exporter
1、下载Node Exporter
# curl -LO https://github.com/prometheus/node_exporter/releases/download/v1.0.1/node_exporter-1.0.1.linux-amd64.tar.gz
# tar -xzf node_exporter-1.0.1.linux-amd64.tar.gz
# cd node_exporter-1.0.1.linux-amd64.tar.gz/
# mv node_exporter /opt/prometheus/
# cd /opt/prometheus/
2、运行node exporter
可以直接运行,也可以以服务的方式启动并设置开机自启
- 直接运行
# touch /opt/logs/node_exporter-9100.log
# nohup ./node_exporter > /opt/logs/node_exporter-9100.log 2>&1 &
# netstat -anplt|grep 9100
如需修改端口可以使用如下方式启动
# nohup ./node_exporter --web.listen-address=:9900 > /opt/logs/node_exporter-9900.log 2>&1 &
2)以服务的方式启动并设置开机自启
# cat > /etc/systemd/system/node_exporter.service <<EOF
[Unit]
Description=node_exporter
After=network.target
[Service]
ExecStart=/opt/prometheus/node_exporter \
--web.listen-address=:9900
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
启动服务,设置开机自启,并检查服务开启状态
# systemctl daemon-reload
# systemctl enable node_exporter
# systemctl start node_exporter
# systemctl status node_exporter
3、访问:http://192.168.56.101:9100/
4、关联Prometheus与Node Exporter
修改Prometheus Server的配置文件prometheus.yml
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['127.0.0.1:9090']
# 采集node exporter监控数据
- job_name: 'node'
static_configs:
- targets: ['192.168.56.101:9100']
5、重新启动Prometheus Server
访问http://192.168.56.101:9090,进入到Prometheus Server。如果输入“up”并且点击执行按钮以后,可以看到如下结果
up{instance=“127.0.0.1:9090”,job=“prometheus”} 1
up{instance=“192.168.56.101:9100”,job=“node”} 1
其中“1”表示正常,反之“0”则为异常。
三、裸机安装alertmanager
1、下载安装
# curl -LO https://github.com/prometheus/alertmanager/releases/download/v0.21.0/alertmanager-0.21.0.linux-amd64.tar.gz
# tar -zxvf alertmanager-0.21.0.linux-amd64.tar.gz
# mkdir -p /opt/prometheus/alertmanager/data/
# mv alertmanager-0.21.0.linux-amd64/* /opt/prometheus/alertmanager/
# cd /opt/prometheus/alertmanager/
2、启动邮件通知
# vim /opt/prometheus/alertmanager/alertmanager.yml
配置如下内容
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qiye.aliyun.com:465' #此处使用阿里云邮箱,虽然可使用25端口,但此处只能用465端口TLS加密连接
smtp_from: 'xxx@xxx.com'
smtp_auth_username: 'xxx@xxx.com'
smtp_auth_password: xxxxxx
smtp_require_tls: false #此处使用阿里云邮箱,该配置不可缺少
route:
group_by: ['alertname']
group_wait: 10s
group_interval: 10s
repeat_interval: 1h
receiver: 'mail-receiver'
receivers:
- name: 'mail-receiver'
email_configs:
- to: xxx@xxx.com
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
也可以在启动Alertmanager时使用参数修改相关配置。–config.file用于指定alertmanager配置文件路径,–storage.path用于指定数据存储路径。
以下记录2种启动方式,直接启动和系统启动
1)直接启动:# nohup ./alertmanager > /opt/logs/alertmanager-9093.log 2>&1 &
如需修改端口启动:
# nohup ./alertmanager --web.listen-address=:9039 > /opt/logs/alertmanager-9039.log 2>&1 &
2)系统启动
# cat > /etc/systemd/system/alertmanager.service <<EOF
[Unit]
Description=alertmanager
After=network.target
[Service]
ExecStart=/opt/prometheus/alertmanager/alertmanager --web.listen-address=:9039 --config.file=/opt/prometheus/alertmanager/alertmanager.yml &>> /opt/logs/alertmanager-9039.log
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
启动服务,设置开机自启,并检查服务开启状态
# systemctl daemon-reload
# systemctl enable alertmanager
# systemctl start alertmanager
# systemctl status alertmanager
3)访问:http://192.168.56.101:9093
3、关联Prometheus与Alertmanager
编辑Prometheus配置文件prometheus.yml,并添加以下内容
alerting:
alertmanagers:
- static_configs:
- targets: ['http://192.168.56.101:9093']
重启Prometheus服务,成功后可以从http://192.168.56.101:9090/config查看alerting配置是否生效
此时,再次尝试手动拉高系统CPU使用率(多核cpu可启动多个用户执行命令):
# cat /dev/zero>/dev/null
等待Prometheus告警进行触发状态
四、安装Grafana
1、裸机安装
# wget https://dl.grafana.com/oss/release/grafana-7.2.1-1.x86_64.rpm
# yum install grafana-7.2.1-1.x86_64.rpm
如果是docker :
# docker run -d -p 3000:3000 grafana/grafana
2、启动Grafana服务,设置开机自启,并检查服务开启状态
# sudo systemctl daemon-reload
# sudo systemctl start grafana-server
# sudo systemctl status grafana-server
# sudo systemctl enable grafana-server
如需要修改默认3000端口
# vim /etc/grafana/grafana.ini
修改http_port = 3030
重启服务:# systemctl restart grafana-server
查看启动情况:# systemctl status grafana-server
3、关联Prometheus与Grafana
编辑prometheus.yml并在scrape_configs节点下添加以下内容:
scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['127.0.0.1:9090']
# 采集node exporter监控数据
- job_name: 'node'
static_configs:
- targets: ['192.168.56.101:9100']
# 采集grafana监控数据
- job_name: 'grafana'
static_configs:
- targets: ['192.168.56.101:3000']
重启Prometheus
# cd /opt/prometheus/prometheus-server/
# ps aux|grep prometheus
# kill -9 [prometheus的pid]
# nohup ./prometheus > /opt/logs/prometheus-9090.log 2>&1 &
4、访问:http://192.168.56.101:3000 admin/admin登录
5、可下载开源的Node Exporter的Dashboard模板做参考:
https://grafana.com/grafana/dashboards?dataSource=prometheus
参考:
https://grafana.com/grafana/dashboards/8919
https://grafana.com/grafana/dashboards/11559
6、配置Grafana数据源及其它
五、部署Redis Exporter,监控redis主节点6379
1、下载安装
# cul -OL https://github.com/oliver006/redis_exporter/releases/download/v1.12.0/redis_exporter-v1.12.0.linux-amd64.tar.gz
# tar -xvf redis_exporter-v1.12.0.linux-amd64.tar.gz
# cd redis_exporter-v1.12.0.linux-amd64
# mv redis_exporter-v1.12.0.linux-amd64 /opt/prometheus/
# cd /opt/prometheus/
可直接启动,也可以配置systemctl服务实现开机自启,Redis Exporter启动后默认端口为9121
(1) 直接启动
# ./redis_exporter -redis.addr 127.0.0.1:6379 -redis.password 123456
(2) 配置服务启动
cat > /etc/systemd/system/redis_exporter.service <<EOF
[Unit]
Description=redis_exporter
After=network.target
[Service]
ExecStart=/opt/prometheus/redis_exporter -redis.addr 127.0.0.1:6379 -redis.password 123456
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
启动服务,设置开机自启,并检查服务开启状态
# systemctl daemon-reload
# systemctl start redis_exporter
# systemctl status redis_exporter
# systemctl enable redis_exporter
# systemctl list-units --type=service|grep redis
3、配置prometheus.yml
- job_name: redis
static_configs:
- targets: ['127.0.0.1:9121']
labels:
instance: redis_sentinel
4、重启Prometheus
5、下载grafana json配置文件导入grafana中:
# wget https://grafana.com/grafana/dashboards/11835
六、安装rabbitmq_exporter
1、安装
# curl -OL https://github.com/kbudde/rabbitmq_exporter/releases/download/v1.0.0-RC7/rabbitmq_exporter-1.0.0-RC7.linux-amd64.tar.gz
# tar zxvf rabbitmq_exporter-1.0.0-RC7.linux-amd64.tar.gz
# cd rabbitmq_exporter-1.0.0-RC7.linux-amd64
# mv rabbitmq_exporter /opt/prometheus/
# cd /opt/prometheus/
以下介绍两种启动方式,直接启动和配置系统服务启动
首先在rabbitmq_exporter同级目录下创建配置文件 rabbitmq_exporter_config.json
文件内容(参考:https://github.com/kbudde/rabbitmq_exporter/blob/master/config.example.json)如下:
{
"rabbit_url": "http://127.0.0.1:15672",
"rabbit_user": "guest",
"rabbit_pass": "123456",
"publish_port": "9099",
"publish_addr": "",
"output_format": "TTY",
"ca_file": "ca.pem",
"cert_file": "client-cert.pem",
"key_file": "client-key.pem",
"insecure_skip_verify": false,
"exlude_metrics": [],
"include_queues": ".*",
"skip_queues": "^$",
"skip_vhost": "^$",
"include_vhost": ".*",
"rabbit_capabilities": "no_sort,bert",
"enabled_exporters": [
"exchange",
"node",
"overview",
"queue"
],
"timeout": 30,
"max_queues": 0
}
1)启动命令:
# RABBIT_USER=guest RABBIT_PASSWORD=123456 OUTPUT_FORMAT=JSON PUBLISH_PORT=9099 RABBIT_URL=http://127.0.0.1:15672 nohup ./rabbitmq_exporter > /opt/logs/rabbitmq-9099.log 2>&1 &
或
# nohup ./rabbitmq_exporter -config-file rabbitmq_exporter_config.json > /opt/logs/rabbitmq-9099.log 2>&1 &
2)配置系统服务启动:
# cat > /etc/systemd/system/rabbitmq_exporter.service <<EOF
[Unit]
Description=rabbitmq_exporter
After=network.target
[Service]
ExecStart=/opt/prometheus/rabbitmq_exporter -config-file /opt/prometheus/rabbitmq_exporter_config.json
Restart=on-failure
[Install]
WantedBy=multi-user.target
EOF
启动服务,设置开机自启,并检查服务开启状态
# systemctl daemon-reload
# systemctl start rabbitmq_exporter
# systemctl status rabbitmq_exporter
# systemctl enable rabbitmq_exporter
启动查看:
# tail -f /var/log/messages
# systemctl status rabbitmq_exporter
启动验证
# curl -i 127.0.0.1:9099/metrics
2、配置prometheus.yml
- job_name: 'rabbitmq'
static_configs:
- targets: ['127.0.0.1:9099']
labels:
instance: rabbitmq
3、重启prometheus
# ps aux|grep prometheus
# kill -9 [pid]
# nohup ./prometheus > /opt/logs/prometheus-9090.log 2>&1 &
4、下载dashboard json 导入grafana
https://grafana.com/dashboards/2121
https://grafana.com/grafana/dashboards/4371
七、常见异常
邮件预警配置时遇到的错误记录如下
配置
smtp.winchannel.net:25
报错:
level=error ts=2020-04-08T06:02:44.036Z caller=notify.go:372 component=dispatcher msg=“Error on notify” err=“send STARTTLS command: x509: certificate is valid for *.mxhichina.com, mxhichina.com, not smtp.winchannel.net” context_err=“context deadline exceeded”
level=error ts=2020-04-08T06:02:44.036Z caller=dispatch.go:301 component=dispatcher msg=“Notify for alerts failed” num_alerts=1 err=“send STARTTLS command: x509: certificate is valid for *.mxhichina.com, mxhichina.com, not smtp.winchannel.net”
配置
smtp.winchannel.net
smtp_require_tls:false
报错:
level=warn ts=2020-10-12T10:34:11.780Z caller=notify.go:674 component=dispatcher receiver=mail-receiver integration=email[0] msg=“Notify attempt failed, will retry later” attempts=1 err="*smtp.plainAuth auth: unencrypted connection"
level=error ts=2020-10-12T10:34:21.581Z caller=dispatch.go:309 component=dispatcher msg=“Notify for alerts failed” num_alerts=1 err=“mail-receiver/email[0]: notify retry canceled after 7 attempts: *smtp.plainAuth auth: unencrypted connection”
配置smtp.qiye.aliyun.com:465
报错:
level=warn ts=2020-10-12T11:36:41.779Z caller=notify.go:674 component=dispatcher receiver=mail-receiver integration=email[0] msg=“Notify attempt failed, will retry later” attempts=1 err="‘require_tls’ is true (default) but “smtp.qiye.aliyun.com:465” does not advertise the STARTTLS extension"
level=error ts=2020-10-12T11:36:51.578Z caller=dispatch.go:309 component=dispatcher msg=“Notify for alerts failed” num_alerts=1 err=“mail-receiver/email[0]: notify retry canceled after 8 attempts: ‘require_tls’ is true (default) but “smtp.qiye.aliyun.com:465” does not advertise the STARTTLS extension”
tail: /opt/logs/alertmanager-9093.log: file truncated
配置以下两行发送邮件正常
smtp.qiye.aliyun.com:465
smtp_require_tls: false
参考:
[1]https://www.prometheus.wang/quickstart/why-monitor.html
[2]https://prometheus.io/
[3]https://github.com/prometheus/
[4]https://www.cnblogs.com/gered/p/13535212.html
[6]https://github.com/free/sql_exporter/releases/tag/0.5
[7]https://github.com/oliver006/redis_exporter/releases/tag/v1.12.0
[8]https://my.oschina.net/yugj/blog/3056695
[9]https://juejin.im/post/6844903793977458695
[10]https://github.com/a4sh3u/eureka_exporter
[11]https://github.com/Mautu/eureka_exporter
[12]https://developer.ibm.com/zh/depmodels/cloud/articles/cl-lo-prometheus-getting-started-and-practice/
[13]https://github.com/kbudde/rabbitmq_exporter
[14]https://blog.csdn.net/yaomingyang/article/details/104037083
[15]https://github.com/prometheus/alertmanager
[16]https://grafana.com/grafana/dashboards
[17]https://grafana.com/grafana/download?platform=linux
[18]https://grafana.com/docs/grafana/latest/administration/configure-docker/
[19]https://grafana.com/docs/grafana/latest/installation/rpm/#2-start-the-server
[20]https://blog.csdn.net/weixin_44723434/article/details/89237202
[21]https://grafana.com/docs/loki/latest/installation/local/
[22]https://grafana.com/docs/loki/latest/getting-started/get-logs-into-loki/
[23]https://github.com/grafana/loki/issues/2736
[24]https://grafana.com/docs/loki/latest/getting-started/troubleshooting/
[25]https://www.cnblogs.com/shhnwangjian/p/6879683.html
[26]https://blog.csdn.net/li4528503/article/details/106709682?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-2.channel_param
更多推荐
所有评论(0)