Prometheus监控报警+Grafana loki日志聚合系统

参考文档

https://blog.csdn.net/m0_57480266/article/details/121190725?spm=1001.2101.3001.6650.1&utm_medium=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-121190725-blog-111992382.235%5Ev32%5Epc_relevant_default_base3&depth_1-utm_source=distribute.pc_relevant.none-task-blog-2%7Edefault%7ECTRLIST%7ERate-1-121190725-blog-111992382.235%5Ev32%5Epc_relevant_default_base3&utm_relevant_index=2
https://www.yuque.com/fcant/sys/pxoiwq
https://www.cnblogs.com/zydev/p/16768810.html

image-20230427141657371

Prometheus的特点

- 多维度数据模型。
- 灵活的查询语言。
- 不依赖分布式存储,单个服务器节点是自主的。
- 通过基于HTTP的pull方式采集时序数据。
- 可以通过中间网关进行时序列数据推送。
- 通过服务发现或者静态配置来发现目标服务对象。
- 支持多种多样的图表和界面展示,比如Grafana等。

主要组件:

Prometheus server: 用于收集和存储时间序列数据
exporter: 客户端生成监控指标
Alertmanager: 处理警报
Grafana: 数据可视化和输出
Pushgateway:主动推送数据给Prometheus server

拉取镜像

服务器选择 192.168.31.241

docker pull prom/node-exporter
docker pull prom/prometheus
docker pull grafana/grafana       

编辑prometheus配置文件

mkdir /etc/prometheus
vim /etc/prometheus/prometheus.yml

/etc/prometheus/prometheus.yml

# 全局配置
global:
  scrape_interval: 15s
  evaluation_interval: 15s
  # scrape_timeout is set to the global default (10s).
# 告警配置
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['192.168.1.200:9093']
# 加载一次规则,并根据全局“评估间隔”定期评估它们。
rule_files:
  - "/etc/prometheus/rules.yml"
# 控制Prometheus监视哪些资源
# 默认配置中,有一个名为prometheus的作业,它会收集Prometheus服务器公开的时间序列数据。
scrape_configs:
  # 作业名称将作为标签“job=<job_name>`添加到此配置中获取的任何数据。
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'node'
    static_configs:
      - targets: ['ip:9100']
        labels:
          env: dev
          role: docker

注意:修改IP地址,这里的ip就是本机地址

image-20230427144035838

2.3 编辑告警规则文件

/etc/prometheus/rules.yml

groups:
- name: example
  rules:
 # Alert for any instance that is unreachable for >5 minutes.
  - alert: InstanceDown
    expr: up == 0
    for: 1m
    labels:
      serverity: page
    annotations:
      summary: "Instance {{ $labels.instance }} down"
      description: "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 5 minutes."

image-20230427144105356

2.4 编辑告警配置文件

/etc/alertmanager/alertmanager.yml

global:
  resolve_timeout: 5m
  smtp_smarthost: 'xxx@xxx:587'
  smtp_from: 'zhaoysz@xxx'
  smtp_auth_username: 'xxx@xxx'
  smtp_auth_password: 'xxxx'
  smtp_require_tls: true
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'test-mails'
receivers:
- name: 'test-mails'
  email_configs:
  - to: 'scottcho@qq.com'
2.5 编辑docker-compose

/docker-compose/prometheus/docker-compose.yml

services:
  prometheus:
   image: prom/prometheus
   volumes:
     - /etc/prometheus/:/etc/prometheus/
     - prometheus_data:/prometheus
   command:
     - '--config.file=/etc/prometheus/prometheus.yml'
     - '--storage.tsdb.path=/prometheus'
     - '--web.console.libraries=/usr/share/prometheus/console_libraries'
     - '--web.console.templates=/usr/share/prometheus/consoles'
     - '--web.external-url=http://192.168.31.241:9090/'
     - '--web.enable-lifecycle'
     - '--storage.tsdb.retention=15d'
   ports:
     - 9090:9090
   links:
     - alertmanager:alertmanager
   restart: always
  alertmanager:
   image: prom/alertmanager
   ports:
     - 9093:9093
   volumes:
     - /etc/alertmanager/:/etc/alertmanager/
     - alertmanager_data:/alertmanager
   command:
     - '--config.file=/etc/alertmanager/alertmanager.yml'
     - '--storage.path=/alertmanager'
   restart: always
  grafana:
   image: grafana/grafana
   ports:
     - 3000:3000
   volumes:
     - /etc/grafana/:/etc/grafana/provisioning/
     - grafana_data:/var/lib/grafana
   environment:
     - GF_INSTALL_PLUGINS=camptocamp-prometheus-alertmanager-datasource
   links:
     - prometheus:prometheus
     - alertmanager:alertmanager
   restart: always

volumes:
  prometheus_data: {}
  grafana_data: {}
  alertmanager_data: {}
2.6 启动composer#
docker-compose up -d
2.7 访问端点#

image-20230427151116335

三、 添加监控主机Job

3.1 安装Node_Export

如果需要多个就安装多个

node_export用于采集主机信息,本质是一个采用http的协议的api

RedHat家族的操作系统可以采用yum进行安装

启动node-exporter

  docker run -d -p 9100:9100 \
  -v "/proc:/host/proc:ro" \
  -v "/sys:/host/sys:ro" \
  -v "/:/rootfs:ro" \
  prom/node-exporter

访问http://ip:9100/metrics

如下就是node-exporter作为agent收集到的可展示数据。

image-20230427151957804

访问targets,url如下:

http://ip:10050/targets   

image-20230427153542781

启动grafana

访问首页

http://192.168.31.236:30030 admin 123456
我这里用的k8s的运维大盘,我自己搭建的,都是内网服务器是可以通信的

添加数据源

image-20230427153900662

监测服务器负载配置

create->import填入8919模板引擎,选择prometheus数据源,就能加载出服务器负载到主页

image-20230427155733881

image-20230427155716144

image-20230428151318157

轻量级日志分析平台Loki搭建

img

Loki组成

  1. loki是主服务器,负责存储日志和处理查询。
  2. promtail是代理,负责收集日志并将其发送给 loki 。
  3. Grafana用于 UI 展示。

使用docker部署

下载yaml文件

wget https://raw.githubusercontent.com/grafana/loki/v2.2.0/production/docker-compose.yaml -O docker-compose.yaml
version: "3"

networks:
  loki:

services:
  loki:
    image: grafana/loki:2.0.0
    ports:
      - "3100:3100"
    command: -config.file=/etc/loki/local-config.yaml
    networks:
      - loki

  promtail:
    image: grafana/promtail:2.0.0
    volumes:
      - /var/log:/var/log
    command: -config.file=/etc/promtail/config.yml
    networks:
      - loki

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    networks:
      - loki

image-20230428094508234

启动服务
image-20230428094841202

5、配置服务

http://192.168.106.202:3000/

默认granfna密码admin/admin,已经改成123456

5.1 配置数据源

image-20230428095118676

配置ip和默认数据源,配置完成点击测试/保存

image-20230428095139532

5.2 配置数据源

explore 查询样例

image-20230428095158235

5.3 输出匹配日志信息

image-20230428095222980

至此一次样例日志查询完成

6、promtail配置详解

promtail容器为日志采集容器,配置文件在promtail容器/etc/promtail/config.yml,将该容器部署在需要采集日志的服务器上就能正常采集日志传回loki服务收集整理

root@2a0cc144dd58:/# cat /etc/promtail/config.yml
server:
  http_listen_port: 3101
  grpc_listen_port: 0

positions:
  filename: /run/promtail/positions.yaml

clients:
  - url: http://192.168.31.241:3100/loki/api/v1/push     #这里配置的地址为loki服务器日志收集的信息

scrape_configs:
- job_name: nginx-two
  static_configs:
  - targets:
      - localhost
    labels:
      job: nginx-2                          #这里为刚才选择job下子标签
      __path__: /front/nginx/logs/*.log     #将采集的日志放在/var/log/*log下自动发现

image-20230428095954956

7.2 编写docker-compose.yaml配置文件
version: "v1"
services:
  promtail:
    image: grafana/promtail:2.0.0
    container_name: promtail-node
    volumes:
      - /promtail/config.yml:/etc/promtail/config.yml
      - /front/nginx/logs/:/front/nginx/logs/           #挂载目录日志目录要挂载进去
      - /run/promtail:/run/promtail        
    ports:
      - 3101:3101 

增加一台服务器日志采集

7.1 编写promtail的配置文件config.yml

mkdir /root/promtail &&cd /root/promtail

[root@node2 promtail]# cat config.yml 
server:
  http_listen_port: 3101
  grpc_listen_port: 0

positions:
  filename: /run/promtail/positions.yaml

clients:
  - url: http://192.168.31.241:3100/loki/api/v1/push 

scrape_configs:
- job_name: nginx-two
  static_configs:
  - targets:
      - localhost
    labels:
      job: nginx-2
      __path__: /front/nginx/logs/*.log 

- job_name: measureback 
  static_configs:
  - targets:
      - localhost
    labels:
      job: measureback 
      __path__: /beehooo/log/*.log 

image-20230427163730846

7.2 编写docker-compose.yaml配置文件

version: "v1"
services:
  promtail:
    image: grafana/promtail:2.0.0
    container_name: promtail-node
    volumes:
      - /promtail/config.yml:/etc/promtail/config.yml
      - /beehooo/log/:/beehooo/log/ 
      - /front/nginx/logs/:/front/nginx/logs/ 
      - /run/promtail:/run/promtail        
    ports:
      - 3101:3101  

image-20230427163849135

最终结果
-PVzf4SbL-1683535304402)]

7.2 编写docker-compose.yaml配置文件

version: "v1"
services:
  promtail:
    image: grafana/promtail:2.0.0
    container_name: promtail-node
    volumes:
      - /promtail/config.yml:/etc/promtail/config.yml
      - /beehooo/log/:/beehooo/log/ 
      - /front/nginx/logs/:/front/nginx/logs/ 
      - /run/promtail:/run/promtail        
    ports:
      - 3101:3101  

[外链图片转存中…(img-9TlWpf3e-1683535304403)]

最终结果
image-20230428140132368

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐