prometheus-k8s中安装部署
四、k8s中安装Prometheus安装方式较多,比如:GitEe:https://gitee.com/liugpwwwroot/k8s-prometheus-grafana/tree/master/prometheus;GItHub:https://github.com/prometheus-operator/kube-prometheusHelm:https://artifacthub.io/
四、k8s中安装
Prometheus
安装方式较多,比如:
- GitEe:
https://gitee.com/liugpwwwroot/k8s-prometheus-grafana/tree/master/prometheus
; - GItHub:
https://github.com/prometheus-operator/kube-prometheus
- Helm:
https://artifacthub.io/packages/helm/grafana/grafana
4.1、prometheus-operator简介
创建Operator关键是对CRD(自定义k8s资源对象)的扩展。Operator的作用是将运维人员的知识给代码化,其核心主要有两个概念:资源: 对象的状态定义; 控制器:观测、分析和行动,以调节资源的分布
Operator会负责创建Prometheus,ServiceMonitor,Prometheus,Alertmanager,prometheusrules等对象
,并一直监控和维护这些对象的状态。ServiceMonitor为各Exporter的抽象。service和serviceMonitor都是k8s的资源对象。一个ServiceMonitor通过label selector去匹配一类的service。部署后生成的crd有:
[root@master1 manifests]# kubectl get crd
NAME CREATED AT
alertmanagerconfigs.monitoring.coreos.com 2021-06-30T09:55:49Z
alertmanagers.monitoring.coreos.com 2021-06-30T09:55:49Z
podmonitors.monitoring.coreos.com 2021-06-30T09:55:49Z
probes.monitoring.coreos.com 2021-06-30T09:55:49Z
prometheuses.monitoring.coreos.com 2021-06-30T09:55:49Z
prometheusrules.monitoring.coreos.com 2021-06-30T09:55:50Z
servicemonitors.monitoring.coreos.com 2021-06-30T09:55:50Z
thanosrulers.monitoring.coreos.com 2021-06-30T09:55:50Z
[root@master1 manifests]# kubectl api-resources |grep monitoring.coreos.com
alertmanagerconfigs monitoring.coreos.com true AlertmanagerConfig
alertmanagers monitoring.coreos.com true Alertmanager
podmonitors monitoring.coreos.com true PodMonitor
probes monitoring.coreos.com true Probe
prometheuses monitoring.coreos.com true Prometheus
prometheusrules monitoring.coreos.com true PrometheusRule
servicemonitors monitoring.coreos.com true ServiceMonitor
thanosrulers monitoring.coreos.com true ThanosRuler
其中prometheus是为prometheus server存在的,servicemonitor是exporter(提供metric的接口)的各种抽象。prometheus通过ServiceMonitor提供的metrics数据接口去pull数据的。alertmanager对应的是alertmanager server,prometheusrules对应的是prometheus定义的各种告警文件
官网地址:https://prometheus-operator.dev/docs/prologue/introduction/
4.2、prometheus-operaotr安装
#1、安装
[root@master1 prometheus-yaml]# wget https://github.com/prometheus-operator/kube-prometheus/archive/refs/heads/main.zip
[root@master1 prometheus-yaml]# unzip main.zip ;cd kube-prometheus-main/manifests/setup
[root@master1 setup]# kubectl create -f .
[root@master1 setup]# cd ../; kubectl apply -f .
#2、暴露service,修改为nodePort
[root@master1 manifests]# kubectl edit svc/grafana -n monitoring
[root@master1 manifests]# kubectl edit svc/prometheus-k8s -n monitoring
查看prometheus
查看grafana
注意:如果集群没有安装dns
插件,建议在grafana web界面添加datasource
使用ip方式添加。并设置为default
4.3、监控scheduler和controller
安装后查看grafana搭配那中 controller manager和scheudler grafana
大盘无数据问题处理
1、监控scheduler
[root@master1 manifests]# kubectl delete servicemonitor/kube-scheduler -n monitoring
[root@master1 manifests]# vim kubernetes-serviceMonitorKubeScheduler.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/name: kube-scheduler
name: kube-scheduler
namespace: kube-system
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
port: http-metrics #这里改为http-metrics
scheme: http #这里改为http
tlsConfig:
insecureSkipVerify: true
jobLabel: app.kubernetes.io/name
namespaceSelector: #表示去匹配某一命名空间中的service,如果想从所有的namespace中匹配用any: true
matchNames:
- kube-system
selector: # 匹配的 Service 的labels,如果使用mathLabels,则下面的所有标签都匹配时才会匹配该service,如果使用matchExpressions,则至少匹配一个标签的service都会被选择
matchLabels:
app.kubernetes.io/name: kube-scheduler
[root@master1 manifests]# kubectl apply -f kubernetes-serviceMonitorKubeScheduler.yaml
[root@master1 my-yaml]# cat scheduler.yaml #为scheduler创建service
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-scheduler
labels:
app.kubernetes.io/name: kube-scheduler
spec:
selector:
component: kube-scheduler
ports:
- name: http-metrics
port: 10251
targetPort: 10251
protocol: TCP
2、监控controller-manager
这里使用另外一种方法,servicemonitor(kube-controller-manager)仍然使用monitoring的namespace的
#1、查看默认的ServiceMonitor
[root@master1 manifests]# cat kubernetes-serviceMonitorKubeControllerManager.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/name: kube-controller-manager
name: kube-controller-manager
namespace: monitoring
spec:
endpoints:
- bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
interval: 30s
metricRelabelings:
...
部分内容省略
...
port: http-metrics #因为我本地的为controller manger暴露的为http 的metric,这里修改
scheme: http #这里也要修改
tlsConfig:
insecureSkipVerify: true
jobLabel: app.kubernetes.io/name
namespaceSelector:
matchNames:
- kube-system
selector:
matchLabels:
app.kubernetes.io/name: kube-controller-manager
[root@master1 manifests]# kubectl apply -f kubernetes-serviceMonitorKubeControllerManager.yaml
#2、创建service
[root@master1 my-yaml]# vi controller-manager.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: kube-controller-manager
labels:
app.kubernetes.io/name: kube-controller-manager
spec:
selector:
component: kube-controller-manager
ports:
- name: http-metrics
port: 10252
targetPort: 10252
protocol: TCP
[root@master1 my-yaml]# kubectl apply -f controller-manager.yaml
[root@master1 my-yaml]# kubectl get svc -l app.kubernetes.io/name=kube-controller-manager -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-controller-manager ClusterIP 10.102.211.176 <none> 10252/TCP 46s
[root@master1 my-yaml]# kubectl describe svc -l app.kubernetes.io/name=kube-controller-manager -n kube-system
Name: kube-controller-manager
Namespace: kube-system
Labels: app.kubernetes.io/name=kube-controller-manager
Annotations: Selector: component=kube-controller-manager
Type: ClusterIP
IP: 10.102.211.176
Port: http-metrics 10252/TCP
TargetPort: 10252/TCP
Endpoints: 192.168.56.101:10252
Session Affinity: None
Events: <none>
4.4、自定义监控etcd
把etcd看作一个集群外部的应用
#1、创建访问etcd的secret
[root@master1 my-yaml]# cd /etc/kubernetes/pki/etcd/
[root@master1 etcd]# ls
ca.crt ca.key healthcheck-client.crt healthcheck-client.key peer.crt peer.key server.crt server.key
[root@master1 etcd]# kubectl -n monitoring create secret generic etcd-certs --from-file=./healthcheck-client.key --from-file=./healthcheck-client.crt --from-file=./ca.crt
secret/etcd-certs created
#2、prometheus加载secret
[root@master1 manifests]# vim prometheus-prometheus.yaml
...
image: www.mt.com:9500/prometheus/prometheus:v2.28.0
secrets:
- etcd-certs
...
[root@master1 manifests]# kubectl apply -f prometheus-prometheus.yaml
[root@master1 etcd]# kubectl exec prometheus-k8s-0 -n monitoring -- /bin/ls "/etc/prometheus/secrets/etcd-certs" 2> /dev/null #证书存放位置
ca.crt
healthcheck-client.crt
healthcheck-client.key
#3、创建servicemonitor
[root@master1 my-yaml]# vim prometheus-serviceMonitorEtcd.yaml
[root@master1 my-yaml]# cat prometheus-serviceMonitorEtcd.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app.kubernetes.io/name: etcd-k8s
name: etcd-k8s
namespace: kube-system
spec:
jobLabel: etcd-k8s
endpoints:
- port: port
interval: 3s
scheme: https
tlsConfig:
caFile: /etc/prometheus/secrets/etcd-certs/ca.crt
certFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.crt
keyFile: /etc/prometheus/secrets/etcd-certs/healthcheck-client.key
insecureSkipVerify: true
selector:
matchLabels:
etcd-k8s: etcd
namespaceSelector:
matchNames:
- kube-system
[root@master1 my-yaml]# kubectl create -f prometheus-serviceMonitorEtcd.yaml --dry-run
#4、创建service匹配serviceMonitor
注意:这里把etcd当作一个外部的应用来部署,不通过labelseletor,通过手动创建一个endpoints去指定etcd的访问地址,
[root@master1 my-yaml]# vim etcd-service.yaml
apiVersion: v1
kind: Service
metadata:
namespace: kube-system
name: etcd-k8s
labels:
etcd-k8s: etcd
spec:
type: ClusterIP
clusterIP: None
ports:
- name: port
port: 2379
---
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: kube-system
labels:
etcd-k8s: etcd
subsets:
- addresses:
- ip: 192.168.56.101
nodeName: etcd-master1
ports:
- name: port
port: 2379
[root@master1 my-yaml]# kubectl apply -f etcd-service.yaml
service/etcd-k8s created
endpoints/etcd-k8s created
[root@master1 my-yaml]# kubectl describe svc -n kube-system -l etcd-k8s=etcd #确认是否关联成功
Name: etcd-k8s
Namespace: kube-system
Labels: etcd-k8s=etcd
Annotations: Selector: <none>
Type: ClusterIP
IP: None
Port: port 2379/TCP
TargetPort: 2379/TCP
Endpoints: 192.168.56.101:2379
Session Affinity: None
Events: <none>
4.5、告警推送注意事项
高版本对不同namespace进行了隔离
alertManagerConfig的动态发现是只会发现当前namespace下面的,如果需要推送不同namespace的告警信息,注意事项:
#1、告警信息确认
1)确认当前有告警内容,确认告警内容的所属的namespace
2)确认alertmanager已经收到告警信息,可通过kubectl logs 确认
#2、确认alertmanager配置信息
alertmanager.spec.alertmanagerConfigNamespaceSelector 还是 alertmanager.spec.alertmanagerConfigSelector
这里对alertmanagerConfigNamespaceSelector 说明
...
spec:
alertmanagerConfigNamespaceSelector:
matchLabels:
alertmanagerconfig: enabled #匹配有该label的namepsace告警才会被发送,相应的需要在对应的namespace上打上该标签
...
#3、确认prometheusrules的rule有对应的标签
[root@master1 manifests]# kubectl get prometheusrules/alertmanager-main-rules -n monitoring -o yaml
spec:
groups:
- name: alertmanager.rules
rules:
- alert: AlertmanagerFailedReload
annotations:
description: Configuration has failed to load for {{ $labels.namespace }}/{{
$labels.pod}}.
runbook_url: https://github.com/prometheus-operator/kube-prometheus/wiki/alertmanagerfailedreload
summary: Reloading an Alertmanager configuration has failed.
expr: |
# Without max_over_time, failed scrapes could create false negatives, see
# https://www.robustperception.io/alerting-on-gauges-in-prometheus-2-0 for details.
max_over_time(alertmanager_config_last_reload_successful{job="alertmanager-main",namespace="monitoring"}[5m]) == 0
for: 10m
labels:
severity: critical
namespace: monitoring #这里要加上自己所在namespace的标签才会被alertmanager推送
#4、创建alertmanagerconfig在需要推送告警的namespace
[root@master1 my-yaml]# cat alertmanagerconfig.yaml #如果所有目标namespace配置都一样,则需要在不同的namespace apply以下,如果不同namespace有不同的alertmanagerconfig则需要单独创建,并在不同的namespace apply
apiVersion: monitoring.coreos.com/v1alpha1
kind: AlertmanagerConfig
metadata:
name: alertmanager-config
labels:
alertmanagerconfig: example
spec:
route:
groupBy: ['alertname','job','severity']
groupWait: 30s
groupInterval: 2m
repeatInterval: 5m
receiver: 'webhook'
receivers:
- name: 'webhook'
webhookConfigs:
- url: 'http://127.0.0.1:8086/dingtalk/webhook1/send'
注意:当前版本alertmanagerconfig定义成了一个secret,修改 manifests/alertmanager-secret.yaml 即可,如果在其他集群中使用alertmanagerconfig这个k8s crd作为配置文件,则可通过apply的方式调整配置信息
4.6、其他
#1、规则文件
prometheu pod内查看
/prometheus $ ls /etc/prometheus/rules/prometheus-k8s-rulefiles-0/ #这里有所有定义的规则文件,或者自己创建prometheusrules 这个资源对象来创建
monitoring-alertmanager-main-rules.yaml monitoring-node-exporter-rules.yaml
monitoring-kube-prometheus-rules.yaml monitoring-prometheus-k8s-prometheus-rules.yaml
monitoring-kube-state-metrics-rules.yaml monitoring-prometheus-operator-rules.yaml
monitoring-kubernetes-monitoring-rules.yaml
[root@master1 my-yaml]# kubectl get prometheus/k8s -n monitoring -o yaml
...
ruleSelector: #匹配这些标签的才会被,如果要自己创建prometheusrules则需要添加这两个标签
matchLabels:
prometheus: k8s
role: alert-rules
...
[root@master1 my-yaml]# kubectl get prometheusrules -l prometheus=k8s,role=alert-rules -A
NAMESPACE NAME AGE
monitoring alertmanager-main-rules 45h
monitoring kube-prometheus-rules 45h
monitoring kube-state-metrics-rules 45h
monitoring kubernetes-monitoring-rules 45h
monitoring node-exporter-rules 45h
monitoring prometheus-k8s-prometheus-rules 45h
monitoring prometheus-operator-rules 45h
更多推荐
所有评论(0)