使用Prometheus对k8sdemo进行监控
使用prometheus监控的基本方法如下 对于容器化的服务,如果有metrics接口,可以使用servicemonitor与此pod对应的service建立关联,从而使Prometheus能读取到相关监控数据如果是没有metrics接口的普通应用,就要通过exportor的方式,利用exportor读取相关监控数据,再由Prometheus读取exportor的metrics接口的方式,实现对此
使用prometheus监控的基本方法如下
对于容器化的服务,如果有metrics接口,可以使用servicemonitor与此pod对应的service建立关联,从而使Prometheus能读取到相关监控数据
如果是没有metrics接口的普通应用,就要通过exportor的方式,利用exportor读取相关监控数据,再由Prometheus读取exportor的metrics接口的方式,实现对此应有的监控
对于mysql,redis等常见的开源软件,有专用的exportor提供监控接口,但对于大部分自己开发的web服务,则要通过黑盒监控的方式来获取监控接口。
部署黑盒监控
部署黑盒监控很简单,直接跑以下yaml文件即可
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: blackbox-exporter
name: blackbox-exporter
namespace: monitoring
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
app: blackbox-exporter
strategy:
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: blackbox-exporter
spec:
containers:
- args:
- --config.file=/mnt/blackbox.yml
env:
- name: TZ
value: Asia/Shanghai
- name: LANG
value: C.UTF-8
image: prom/blackbox-exporter:master
imagePullPolicy: IfNotPresent
lifecycle: {}
name: blackbox-exporter
ports:
- containerPort: 9115
name: web
protocol: TCP
resources:
limits:
cpu: 20m
memory: 40Mi
requests:
cpu: 10m
memory: 10Mi
securityContext:
allowPrivilegeEscalation: false
privileged: false
readOnlyRootFilesystem: false
runAsNonRoot: false
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
volumeMounts:
- mountPath: /usr/share/zoneinfo/Asia/Shanghai
name: tz-config
- mountPath: /etc/localtime
name: tz-config
- mountPath: /etc/timezone
name: timezone
- mountPath: /mnt
name: config
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
type: ""
name: tz-config
- hostPath:
path: /etc/timezone
type: ""
name: timezone
- configMap:
defaultMode: 420
name: blackbox-conf
name: config
---
apiVersion: v1
data:
blackbox.yml: |-
modules:
http_2xx:
prober: http
http_post_2xx:
prober: http
http:
method: POST
tcp_connect:
prober: tcp
pop3s_banner:
prober: tcp
tcp:
query_response:
- expect: "^+OK"
tls: true
tls_config:
insecure_skip_verify: false
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
irc_banner:
prober: tcp
tcp:
query_response:
- send: "NICK prober"
- send: "USER prober prober prober :prober"
- expect: "PING :([^ ]+)"
send: "PONG ${1}"
- expect: "^:[^ ]+ 001"
icmp:
prober: icmp
kind: ConfigMap
metadata:
name: blackbox-conf
namespace: monitoring
---
apiVersion: v1
kind: Service
metadata:
labels:
app: blackbox-exporter
name: blackbox-exporter
namespace: monitoring
spec:
ports:
- name: container-1-web-1
port: 9115
protocol: TCP
targetPort: 9115
selector:
app: blackbox-exporter
sessionAffinity: None
type: ClusterIP
部署后,测试黑盒监控能否抓取k8sdemo各服务的数据
[root@VM-12-8-centos ~]# kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
example-app ClusterIP 10.1.97.130 <none> 8090/TCP 18d
kubernetes ClusterIP 10.1.0.1 <none> 443/TCP 38d
mall NodePort 10.1.111.235 <none> 8000:31687/TCP 5d4h
mysql ClusterIP 10.1.125.65 <none> 3306/TCP 6d4h
mysql-read ClusterIP 10.1.87.151 <none> 3306/TCP 5d10h
order ClusterIP 10.1.215.125 <none> 7000/TCP 5d4h
passport ClusterIP 10.1.116.84 <none> 5000/TCP 5d4h
product ClusterIP 10.1.38.97 <none> 3000/TCP 5d4h
redis ClusterIP 10.1.14.65 <none> 6379/TCP 5d6h
review ClusterIP 10.1.241.227 <none> 9000/TCP 5d4h
shopcart ClusterIP 10.1.226.250 <none> 6000/TCP 5d4h
[root@VM-12-8-centos ~]# curl http://10.1.20.239:9115/probe?target=10.1.226.250:6000&module=http_2xx
[1] 13423
[root@VM-12-8-centos ~]# # HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 1.2483e-05
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.000911382
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length 102
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.000155884
probe_http_duration_seconds{phase="processing"} 0.000324507
probe_http_duration_seconds{phase="resolve"} 1.2483e-05
probe_http_duration_seconds{phase="tls"} 0
probe_http_duration_seconds{phase="transfer"} 0.00010323
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 0
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 0
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_uncompressed_body_length Length of uncompressed response body
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 102
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 1.1
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.975630568e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
经测试,是可以正常抓取到监控数据的
配置Prometheus新增job及告警rules
利用prometheus-additional的方式,为Prometheus新增job,为job配置上k8sdemo的各服务接口
- job_name: 'blackbox'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- http://10.1.226.250:6000
- http://10.1.38.97:3000/healthz/ready
- http://10.1.116.84:5000
- http://10.1.215.125:7000/healthz/ready
- http://10.1.111.235:8000/healthz/ready
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox-exporter:9115
应用以上yaml文件后,再编辑prometheus,关联这个配置
[root@VM-12-8-centos ~]# kubectl edit prometheus k8s -n monitoring
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
creationTimestamp: "2022-11-13T14:21:08Z"
generation: 3
labels:
prometheus: k8s
name: k8s
namespace: monitoring
resourceVersion: "4752391"
selfLink: /apis/monitoring.coreos.com/v1/namespaces/monitoring/prometheuses/k8s
uid: 98a176a4-f6fa-450c-a9c3-aa9ebe58e30e
spec:
additionalScrapeConfigs:
key: prometheus-additional.yaml
name: additional-scrape-configs
alerting:
alertmanagers:
- name: alertmanager-main
namespace: monitoring
port: web
baseImage: quay.io/prometheus/prometheus
nodeSelector:
kubernetes.io/os: linux
podMonitorNamespaceSelector: {}
podMonitorSelector: {}
replicas: 2
resources:
requests:
memory: 400Mi
ruleSelector:
matchLabels:
prometheus: k8s
role: alert-rules
secrets:
- etcd-ssl
securityContext:
fsGroup: 2000
runAsNonRoot: true
runAsUser: 1000
serviceAccountName: prometheus-k8s
serviceMonitorNamespaceSelector: {}
serviceMonitorSelector: {}
version: v2.11.0
重点在于以下这段配置
additionalScrapeConfigs:
name: additional-scrape-configs
key: prometheus-additional.yaml
配置好后,等上一段时间,打开监控界面,可以看到界面已新增了job
然后再新增一个rules,即告警规则,使服务不正常的时候,可以发送告警
可以使用kubectl edit PrometheusRule -n monitoring的方式,在最后新增一个rules
或者通过yaml文件新增一个PrometheusRule
[root@VM-12-8-centos kube-prom]# vim blackbox-rules.yml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
labels:
prometheus: k8s
role: alert-rules
name: blackbox-rules
namespace: monitoring
spec:
groups:
- name: blackboxjk-k8sdemo
rules:
- alert: curlHttpStatus
expr: probe_http_status_code{job="blackbox"}>=400 or probe_success{job="blackbox"}==0
for: 1m
labels:
severity: critical
annotations:
summary: '业务报警: 网站不可访问'
description: '{{$labels.instance}} 不可访问,请及时查看,当前状态码为{{$value}}'
[root@VM-12-8-centos kube-prom]# kubectl apply -f blackbox-rules.yml
prometheusrule.monitoring.coreos.com/etcd-rules created
[root@VM-12-8-centos kube-prom]# kubectl get PrometheusRule -n monitoring
NAME AGE
etcd-rules 85s
prometheus-k8s-rules 19d
配置好后,可以在alert界面看到这个告警规则
最后测试告警的有效性,是否会发送告警邮件
通过删除一个pod的方式来进行测试
[root@VM-12-8-centos devis]# kubectl get deploy
NAME READY UP-TO-DATE AVAILABLE AGE
mall 1/1 1 1 2d3h
mysql 1/1 1 1 5d8h
nfs-client-provisioner 1/1 1 1 32d
order 1/1 1 1 47h
passport 1/1 1 1 47h
product-v1 1/1 1 1 46h
redis 1/1 1 1 5d5h
review 1/1 1 1 2d
shopcart 1/1 1 1 46h
[root@VM-12-8-centos devis]# kubectl delete deploy mall
deployment.extensions "mall" deleted
[root@VM-12-8-centos devis]# kubectl get po
NAME READY STATUS RESTARTS AGE
mall-987568788-jjxld 0/1 Terminating 0 11m
mysql-85695f9484-v2jhz 1/1 Running 0 5d8h
mysql-restore-zts24 0/1 Completed 0 5d6h
nfs-client-provisioner-9494c5c4c-nzvcm 1/1 Running 5 17d
order-6697cfb6c7-tm5kf 1/1 Running 0 3m32s
passport-748d9c48f6-wgszk 1/1 Running 0 47h
product-v1-5d95f79d65-fr78r 1/1 Running 0 46h
redis-756b947968-5c5rl 1/1 Running 0 5d5h
review-5bbc4f96b-5zn7k 1/1 Running 0 2d
shopcart-8c47b75df-wh59z 1/1 Running 0 46h
删除以后,界面出现了告警
同时收到qq告警邮件
deploy mall对应的接口是8000,告警无误
最后通过之前在jenkins部署好的deploy,去重新部署相应的pod
随后收到告警恢复的邮件,测试成功
更多推荐
所有评论(0)