Prometheus operator添加报警规则及通知方式
微信公众号搜索 DevOps和k8s全栈技术 ,即可关注公众号,也可扫描文章最后的二维码关注公众号,每天会分享技术文章供大家阅读参考哈~配置报警修改/root/kube-promethe...
微信公众号搜索 DevOps和k8s全栈技术 ,即可关注公众号,也可扫描文章最后的二维码关注公众号,每天会分享技术文章供大家阅读参考哈~
配置报警
修改/root/kube-prometheus/manifests/alertmanager-service.yaml添加 type: NodePort,方便浏览器访问alertmanager页面
kubectl get svc -n monitoring可以看到alertmanager地址端口信息 http://172.16.0.6:31568/#/status
在alertmanager的status页面可以查看到AlertManager的配置信息
Config
global:
resolve_timeout: 5m
http_config: {}
smtp_from: yunwei@hhotel.com
smtp_hello: hhotel.com
smtp_smarthost: smtp.qiye.aliyun.com:465
smtp_auth_username: yunwei@hhotel.com
smtp_auth_password: <secret>
smtp_require_tls: true
pagerduty_url: https://events.pagerduty.com/v2/enqueue
hipchat_api_url: https://api.hipchat.com/
opsgenie_api_url: https://api.opsgenie.com/
wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
... ...
这些信息实际来自于/root/kube-prometheus/manifests/alertmanager-secret.yaml文件,名为alertmanager-main的secret
apiVersion: v1
data:
alertmanager.yaml: Imdsb2JhbCI6IAogICJyZXNvbHZlX3RpbWVvdXQiOiAiNW0iCiJyZWNlaXZlcnMiOiAKLSAibmFtZSI6ICJudWxsIgoicm91dGUiOiAKICAiZ3JvdXBfYnkiOiAKICAtICJqb2IiCiAgImdyb3VwX2ludGVydmFsIjogIjVtIgogICJncm91cF93YWl0IjogIjMwcyIKICAicmVjZWl2ZXIiOiAibnVsbCIKICAicmVwZWF0X2ludGVydmFsIjogIjEyaCIKICAicm91dGVzIjogCiAgLSAibWF0Y2giOiAKICAgICAgImFsZXJ0bmFtZSI6ICJEZWFkTWFuc1N3aXRjaCIKICAgICJyZWNlaXZlciI6ICJudWxsIg==
kind: Secret
metadata:
name: alertmanager-main
namespace: monitoring
type: Opaque
可以将alertmanager.yaml对应的value值做一个base64解码:
echo Imdsb2JhbCI6IAogICJyZXNvbHZlX3RpbWVvdXQiOiAiNW0iCiJyZWNlaXZlcnMiOiAKLSAibmFtZSI6ICJudWxsIgoicm91dGUiOiAKICAiZ3JvdXBfYnkiOiAKICAtICJqb2IiCiAgImdyb3VwX2ludGVydmFsIjogIjVtIgogICJncm91cF93YWl0IjogIjMwcyIKICAicmVjZWl2ZXIiOiAibnVsbCIKICAicmVwZWF0X2ludGVydmFsIjogIjEyaCIKICAicm91dGVzIjogCiAgLSAibWF0Y2giOiAKICAgICAgImFsZXJ0bmFtZSI6ICJEZWFkTWFuc1N3aXRjaCIKICAgICJyZWNlaXZlciI6ICJudWxsIg== | base64 -d
我们如果想自定义接收器或者模板消息,可以重新生成这个名为alertmanager-main的secret
vi alertmanager.yaml
global:
resolve_timeout: 5m
smtp_smarthost: 'smtp.qiye.aliyun.com:465'
smtp_from: 'yunwei@hhotel.com'
smtp_auth_username: 'yunwei@hhotel.com'
smtp_auth_password: 'aRXjq9W1jto^7^Zb'
smtp_hello: 'hhotel.com'
smtp_require_tls: true
templates:
- "*.tmpl"
route:
group_by: ['job', 'severity']
group_wait: 30s
group_interval: 5m
repeat_interval: 5m
receiver: 'wechat'
routes:
- receiver: 'wechat'
group_wait: 10s
match:
alertname: EtcdClusterUnavailable
receivers:
- name: 'default'
email_configs:
- to: 'yunwei@hhotel.com'
send_resolved: true
- name: 'wechat'
wechat_configs:
- corp_id: 'wx02f71fb3dea46c16'
to_party: '1'
to_user: "renzhenxin"
agent_id: '1'
api_secret: 'r4OGerF_p4UrIN6QERCefJRxzpI0SquNG5gHCxGxcOM'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
创建wechat报警模板 wechat.tmpl
{{ define "wechat.default.message" }}
{{ range .Alerts }}
========start==========
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
告警程序: prometheus_alert
告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
========end==========
{{ end }}
{{ end }}
删除原来的secret,然后再创建
kubectl delete secret alertmanager-main -n monitoring
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml --from-file=wechat.tmpl -n monitoring
查看alertmanegr微信报警模板
kubectl exec -it alertmanager-main-0 /bin/sh -n monitoring
ls /etc/alertmanager/config
cat /etc/alertmanager/config/wechat.tmpl
查看alertmanager的status页面config会显示修改变化
配置自动服务发现
想要让Prometheus Operator去自动发现并监控具有prometheus.io/scrape=true这个annotations的Service,需要对prometheus添加一个额外配置,相应的,Service要在annotation区域添加prometheus.io/scrape=true的声明
vi prometheus-additional.yaml
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
使用这个文件创建一个secret对象
kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
kubectl get secret additional-configs -n monitoring -o yaml
在prometheus资源对象中加入刚才创建的额外配置,在spec下添加
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
完整配置cat /root/kube-prometheus/manifests/prometheus-prometheus.yaml
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
labels:
prometheus: k8s
name: k8s
namespace: monitoring
spec:
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorSelector: {}
replicas: 2
secrets:
- etcd-certs
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
additionalScrapeConfigs:
name: additional-configs
key: prometheus-additional.yaml
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0
kubectl apply -f prometheus-prometheus.yaml
过一会儿到prometheus查看配置已经生效,搜索关键词kubernetes-service-endpoints
kubectl logs -f prometheus-k8s-0 prometheus -n monitoring
可以看到有很多错误日志出现,都是xxx is forbidden,这说明是 RBAC 权限的问题,通过 prometheus 资源对象的配置可以知道 Prometheus 绑定了一个名为 prometheus-k8s 的 ServiceAccount 对象,而这个对象绑定的是一个名为 prometheus-k8s 的 ClusterRole
修改prometheus-clusterRole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: prometheus-k8s
rules:
- apiGroups:
- ""
resources:
- nodes
- services
- endpoints
- pods
- nodes/proxy
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- configmaps
- nodes/metrics
verbs:
- get
- nonResourceURLs:
- /metrics
verbs:
- get
kubectl apply -f prometheus-clusterRole.yaml
从prometheus的targets可以看到已经自动发现了端口9153的服务,这是kube-dns
[root@k8s03 manifests]# kubectl describe svc kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=KubeDNS
Annotations: prometheus.io/port: 9153
prometheus.io/scrape: true
Selector: k8s-app=kube-dns
Type: ClusterIP
IP: 10.96.0.10
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 192.168.73.66:53,192.168.73.67:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 192.168.73.66:53,192.168.73.67:53
Port: metrics 9153/TCP
TargetPort: 9153/TCP
Endpoints: 192.168.73.66:9153,192.168.73.67:9153
Session Affinity: None
Events: <none>
往期精彩文章
kubernetes全栈技术+企业案例演示【带你快速掌握和使用k8s】
Prometheus+Grafana+Alertmanager搭建全方位的监控告警系统-超详细文档
k8s1.18多master节点高可用集群安装-超详细中文官方文档
jenkins+kubernetes+harbor+gitlab构建企业级devops平台
通过kubeconfig登陆k8s的dashboard ui界面
prometheus operator监控k8s集群之外的haproxy组件
技术交流群
学无止境,了解更多关于kubernetes/docker/devops/openstack/openshift/linux/IaaS/PaaS相关内容,想要获取更多资料和免费视频,可按如下方式进入技术交流群
扫码加群????
微信:luckylucky421302
微信公众号
长按指纹关注公众号????
点击在看少个 bug????
更多推荐
所有评论(0)