微信公众号搜索 DevOps和k8s全栈技术 ,即可关注公众号,也可扫描文章最后的二维码关注公众号,每天会分享技术文章供大家阅读参考哈~

配置报警

修改/root/kube-prometheus/manifests/alertmanager-service.yaml添加 type: NodePort,方便浏览器访问alertmanager页面
kubectl get svc -n monitoring可以看到alertmanager地址端口信息 http://172.16.0.6:31568/#/status
在alertmanager的status页面可以查看到AlertManager的配置信息

Config
global:
  resolve_timeout: 5m
  http_config: {}
  smtp_from: yunwei@hhotel.com
  smtp_hello: hhotel.com
  smtp_smarthost: smtp.qiye.aliyun.com:465
  smtp_auth_username: yunwei@hhotel.com
  smtp_auth_password: <secret>
  smtp_require_tls: true
  pagerduty_url: https://events.pagerduty.com/v2/enqueue
  hipchat_api_url: https://api.hipchat.com/
  opsgenie_api_url: https://api.opsgenie.com/
  wechat_api_url: https://qyapi.weixin.qq.com/cgi-bin/
  victorops_api_url: https://alert.victorops.com/integrations/generic/20131114/alert/
route:
... ...

这些信息实际来自于/root/kube-prometheus/manifests/alertmanager-secret.yaml文件,名为alertmanager-main的secret

apiVersion: v1
data:
  alertmanager.yaml: Imdsb2JhbCI6IAogICJyZXNvbHZlX3RpbWVvdXQiOiAiNW0iCiJyZWNlaXZlcnMiOiAKLSAibmFtZSI6ICJudWxsIgoicm91dGUiOiAKICAiZ3JvdXBfYnkiOiAKICAtICJqb2IiCiAgImdyb3VwX2ludGVydmFsIjogIjVtIgogICJncm91cF93YWl0IjogIjMwcyIKICAicmVjZWl2ZXIiOiAibnVsbCIKICAicmVwZWF0X2ludGVydmFsIjogIjEyaCIKICAicm91dGVzIjogCiAgLSAibWF0Y2giOiAKICAgICAgImFsZXJ0bmFtZSI6ICJEZWFkTWFuc1N3aXRjaCIKICAgICJyZWNlaXZlciI6ICJudWxsIg==
kind: Secret
metadata:
  name: alertmanager-main
  namespace: monitoring
type: Opaque

可以将alertmanager.yaml对应的value值做一个base64解码:

echo Imdsb2JhbCI6IAogICJyZXNvbHZlX3RpbWVvdXQiOiAiNW0iCiJyZWNlaXZlcnMiOiAKLSAibmFtZSI6ICJudWxsIgoicm91dGUiOiAKICAiZ3JvdXBfYnkiOiAKICAtICJqb2IiCiAgImdyb3VwX2ludGVydmFsIjogIjVtIgogICJncm91cF93YWl0IjogIjMwcyIKICAicmVjZWl2ZXIiOiAibnVsbCIKICAicmVwZWF0X2ludGVydmFsIjogIjEyaCIKICAicm91dGVzIjogCiAgLSAibWF0Y2giOiAKICAgICAgImFsZXJ0bmFtZSI6ICJEZWFkTWFuc1N3aXRjaCIKICAgICJyZWNlaXZlciI6ICJudWxsIg== | base64 -d

我们如果想自定义接收器或者模板消息,可以重新生成这个名为alertmanager-main的secret
vi alertmanager.yaml

global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.qiye.aliyun.com:465'
  smtp_from: 'yunwei@hhotel.com'
  smtp_auth_username: 'yunwei@hhotel.com'
  smtp_auth_password: 'aRXjq9W1jto^7^Zb'
  smtp_hello: 'hhotel.com'
  smtp_require_tls: true
templates:
  - "*.tmpl"
route:
  group_by: ['job', 'severity']
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 5m
  receiver: 'wechat'
  routes:
  - receiver: 'wechat'
    group_wait: 10s
    match:
      alertname: EtcdClusterUnavailable
receivers:
- name: 'default'
  email_configs:
  - to: 'yunwei@hhotel.com'
    send_resolved: true
- name: 'wechat'
  wechat_configs:
  - corp_id: 'wx02f71fb3dea46c16'
    to_party: '1'
    to_user: "renzhenxin"
    agent_id: '1'
    api_secret: 'r4OGerF_p4UrIN6QERCefJRxzpI0SquNG5gHCxGxcOM'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

创建wechat报警模板 wechat.tmpl

{{ define "wechat.default.message" }}
{{ range .Alerts }}
========start==========
触发时间: {{ .StartsAt.Format "2006-01-02 15:04:05" }}
告警程序: prometheus_alert
告警级别: {{ .Labels.severity }}
告警类型: {{ .Labels.alertname }}
故障主机: {{ .Labels.instance }}
告警主题: {{ .Annotations.summary }}
告警详情: {{ .Annotations.description }}
========end==========
{{ end }}
{{ end }}

删除原来的secret,然后再创建

kubectl delete secret alertmanager-main -n monitoring
kubectl create secret generic alertmanager-main --from-file=alertmanager.yaml --from-file=wechat.tmpl -n monitoring

查看alertmanegr微信报警模板

kubectl exec -it alertmanager-main-0 /bin/sh -n monitoring
ls /etc/alertmanager/config
cat /etc/alertmanager/config/wechat.tmpl

查看alertmanager的status页面config会显示修改变化

配置自动服务发现

想要让Prometheus Operator去自动发现并监控具有prometheus.io/scrape=true这个annotations的Service,需要对prometheus添加一个额外配置,相应的,Service要在annotation区域添加prometheus.io/scrape=true的声明
vi prometheus-additional.yaml

- job_name: 'kubernetes-service-endpoints'
  kubernetes_sd_configs:
  - role: endpoints
  relabel_configs:
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
    action: keep
    regex: true
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
    action: replace
    target_label: __scheme__
    regex: (https?)
  - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
    action: replace
    target_label: __metrics_path__
    regex: (.+)
  - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
    action: replace
    target_label: __address__
    regex: ([^:]+)(?::\d+)?;(\d+)
    replacement: $1:$2
  - action: labelmap
    regex: __meta_kubernetes_service_label_(.+)
  - source_labels: [__meta_kubernetes_namespace]
    action: replace
    target_label: kubernetes_namespace
  - source_labels: [__meta_kubernetes_service_name]
    action: replace
    target_label: kubernetes_name

使用这个文件创建一个secret对象

kubectl create secret generic additional-configs --from-file=prometheus-additional.yaml -n monitoring
kubectl get secret additional-configs -n monitoring -o yaml

在prometheus资源对象中加入刚才创建的额外配置,在spec下添加

additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml

完整配置cat /root/kube-prometheus/manifests/prometheus-prometheus.yaml

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    prometheus: k8s
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - name: alertmanager-main
      namespace: monitoring
      port: web
  baseImage: quay.io/prometheus/prometheus
  nodeSelector:
    kubernetes.io/os: linux
  podMonitorSelector: {}
  replicas: 2
  secrets:
  - etcd-certs
  resources:
    requests:
      memory: 400Mi
  ruleSelector:
    matchLabels:
      prometheus: k8s
      role: alert-rules
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  additionalScrapeConfigs:
    name: additional-configs
    key: prometheus-additional.yaml
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: v2.11.0
kubectl apply -f prometheus-prometheus.yaml

过一会儿到prometheus查看配置已经生效,搜索关键词kubernetes-service-endpoints

kubectl logs -f prometheus-k8s-0 prometheus -n monitoring

可以看到有很多错误日志出现,都是xxx is forbidden,这说明是 RBAC 权限的问题,通过 prometheus 资源对象的配置可以知道 Prometheus 绑定了一个名为 prometheus-k8s 的 ServiceAccount 对象,而这个对象绑定的是一个名为 prometheus-k8s 的 ClusterRole
修改prometheus-clusterRole.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes
  - services
  - endpoints
  - pods
  - nodes/proxy
  verbs:
  - get
  - list
  - watch
- apiGroups:
  - ""
  resources:
  - configmaps
  - nodes/metrics
  verbs:
  - get
- nonResourceURLs:
  - /metrics
  verbs:
  - get
kubectl apply -f prometheus-clusterRole.yaml

从prometheus的targets可以看到已经自动发现了端口9153的服务,这是kube-dns

[root@k8s03 manifests]# kubectl describe svc kube-dns -n kube-system
Name:              kube-dns
Namespace:         kube-system
Labels:            k8s-app=kube-dns
                   kubernetes.io/cluster-service=true
                   kubernetes.io/name=KubeDNS
Annotations:       prometheus.io/port: 9153
                   prometheus.io/scrape: true
Selector:          k8s-app=kube-dns
Type:              ClusterIP
IP:                10.96.0.10
Port:              dns  53/UDP
TargetPort:        53/UDP
Endpoints:         192.168.73.66:53,192.168.73.67:53
Port:              dns-tcp  53/TCP
TargetPort:        53/TCP
Endpoints:         192.168.73.66:53,192.168.73.67:53
Port:              metrics  9153/TCP
TargetPort:        9153/TCP
Endpoints:         192.168.73.66:9153,192.168.73.67:9153
Session Affinity:  None
Events:            <none>

往期精彩文章

kubernetes全栈技术+企业案例演示【带你快速掌握和使用k8s】

python运维开发实战-基础篇

python运维和开发实战-高级篇

python运维和开发实战-安装和创建Django项目

Docker公司禁止被列入美国"实体名单"的国家、企业使用

kubernetes面试题汇总

DevOps视频和资料免费领取

kubernetes技术分享-可用于企业内部培训

谈谈我的IT发展之路

kubernetes系列文章第一篇-k8s基本介绍

kubernetes系列文章第二篇-kubectl

了解pod和pod的生命周期-这一篇文章就够了

Kubernetes中部署MySQL高可用集群

Prometheus+Grafana+Alertmanager搭建全方位的监控告警系统-超详细文档

k8s1.18多master节点高可用集群安装-超详细中文官方文档

k8s中蓝绿部署、金丝雀发布、滚动更新汇总

运维常见问题汇总-tomcat篇

关于linux内核参数的调优,你需要知道

kubernetes挂载ceph rbd和cephfs

报警神器Alertmanager发送报警到多个渠道

jenkins+kubernetes+harbor+gitlab构建企业级devops平台

kubernetes网络插件-flannel篇

kubernetes网络插件-calico篇

kubernetes认证、授权、准入控制

限制不同的用户操作k8s资源

面试真题&技术资料免费领取-覆盖面超全~

Prometheus监控MySQL

Prometheus监控Nginx

Prometheus监控Tomcat

linux面试题汇总

测试通过storageclass动态生成pv

通过编写k8s的资源清单yaml文件部署gitlab服务

helm安装和使用-通过helm部署k8s应用

k8s基于Ingress-nginx实现灰度发布

k8s的Pod安全策略

Prometheus Operator-上篇-安装和使用篇

Prometheus Operator-下篇

通过kubeconfig登陆k8s的dashboard ui界面

通过token令牌登陆k8s dashboard ui界面 

kubernetes集群的etcd数据库详细介绍

Linux网络流量监控工具

kubernetes搭建EFK日志管理系统

prometheus operator监控k8s集群之外的haproxy组件         

kubernetes ConfigMap存储卷                                            

技术交流群

学无止境,了解更多关于kubernetes/docker/devops/openstack/openshift/linux/IaaS/PaaS相关内容,想要获取更多资料和免费视频,可按如下方式进入技术交流群

                            扫码加群????

微信:luckylucky421302

微信公众号

                                     长按指纹关注公众号????


                                       

                                       点击在看少个 bug????

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐