K8S学习笔记0612
K8S笔记
K8S中网络分为overlay(叠加网络 如calico和flannel)和underlay
Cadvisor监控Pod
Cadvisor (容器顾问)让容器用户了解其运行容器的资源使用情况和性能状态,cAdvisor用于收集、聚合、处理和导出有关正在运行的容器的信息,具体来说,对于每个容器都会保存资源隔离参数、历史资源使用情况、完整历史资源使用情况的直方图和网络统计信息,此数据按容器和机器范围导出。
指标名称 | 类型 | 含义 |
container_cpu_load_average_10s | gauge | 过去10秒容器cpu的平均负载 |
container_cpu_usage_seconds_total | counter | 容器在每个CPu内核上的累积 占用时间(单位:秒) |
container_cpu_system_seconds_total | counter | System CPu累积占用时间 (单位:秒) |
container_cpu_user_seconds_total | counter | User CPu累积占用时间 (单位:字节) |
container_fs_usage_bytes | gauge | 容器中文件系统的使用量 (单位:字节) |
container_fs_reads_bytes_total | counter | 容器累积读取数据的总量 (单位:字节) |
container_fs_writes_bytes_total | counter | 容器累积写入数据的总量 (单位:字节) |
container_memory_max_usage_bytes | gauge | 容器的最大内存使用量 (单位:字节) |
container_memory_usage_bytes | gauge | 容器当前的内存使用量 (单位:字节) |
container_spec_memory_limit_bytes | gauge | 容器的内存使用量限制 |
machine_memory_bytes | gauge | 当前主机的内存总量 |
container_network_receive_bytes_total | counter | 容器网络累积接收数据总量 |
container_network_transmit_bytes_total | counter | 容器网络累积传输数据总量 |
当能够正常采集到cAdvisor的样本数据后,可以通过以下表达式计算容器的cPu使用率
sum(irate(container_opu_usage_seconds_total{image!=""} [ 1m] )) without (cpu)
查询容器内存使用量(单位:字节)
container_memory_usage_bytes{image!=""}
查询容器网络接收量(速率)(单位:字节/秒)
sum (rate(container_network_receive_bytes_total{image!=""} [ lm] )) without ( interface)
容器网络传输量字节/秒
sum(rate(container_network_transmit_bytes_total{image!=""} [ 1m] )) without (interface)
容器文件系统读取速率 字节/秒
sum(rate(container_fs_reads_bytes_total{image!=""} [ 1m] )) without (device)
容器文件系统写入速率 字节/秒
sum(rate(container_fs_writes_bytes_total{image!=""} [ lm])) without (device)
容器网络接收的字节数(1分钟内),根据名称查询name=~".+"
sum(rate(container_network_receive_bytes_total{name=~”.+"] [ 1m])) by (name)
容器网络传输的字节数(1分钟内),根据名称查询name=~".+"
sum(rate(container_network_transmit_bytes_total{name=~”.+"}[ 1m])) by (name)
所用容器system cpu的累计使用时间(1min钟内)
sum (rate (container_cpu_system_seconds_total [1m] ))
每个容器system cpu的使用时间( 1min钟内)
sum(irate(container_cpu_system_seconds_total{image!=""} [ 1m] )) without (cpu)
每个容器的cpu使用率
sum(rate(container_cpu_usage_seconds_total(name=~”.+" "} [ lm])) by (name) * 100
总容器的cpu使用率
sum(sum(rate(container_cpu_usage_seconds_total{name=~".+" } [ lm])) by (name)*100 )
通过daemonset部署 Cadvisor
[root@k8s-master1 0612]# cat case1-daemonset-deploy-cadvisor.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: cadvisor
namespace: monitoring
spec:
selector:
matchLabels:
app: cAdvisor
template:
metadata:
labels:
app: cAdvisor
spec:
tolerations: #污点容忍,忽略master的NoSchedule
- effect: NoSchedule
key: node-role.kubernetes.io/master
hostNetwork: true
restartPolicy: Always # 重启策略
containers:
- name: cadvisor
image: k8s-harbor.com/public/cadvisor:v0.39.3
imagePullPolicy: IfNotPresent # 镜像策略
ports:
- containerPort: 8080
volumeMounts:
- name: root
mountPath: /rootfs
- name: run
mountPath: /var/run
- name: sys
mountPath: /sys
- name: docker
mountPath: /var/lib/docker
volumes:
- name: root
hostPath:
path: /
- name: run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /var/lib/docker
K8S部署nodeexporter
[root@k8s-master1 0612]# cat case2-daemonset-deploy-node-exporter.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
spec:
tolerations:
- effect: NoSchedule
key: node-role.kubernetes.io/master
containers:
- image: prom/node-exporter:v1.3.1
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
protocol: TCP
name: metrics
volumeMounts:
- mountPath: /host/proc
name: proc
- mountPath: /host/sys
name: sys
- mountPath: /host
name: rootfs
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
hostNetwork: true
hostPID: true
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: "true"
labels:
k8s-app: node-exporter
name: node-exporter
namespace: monitoring
spec:
type: NodePort
ports:
- name: http
port: 9100
nodePort: 39100
protocol: TCP
selector:
k8s-app: node-exporter
K8S中服务发现
yaml部署prometheus
创建prometheus的配置文件
[root@k8s-master1 prometheus]# cat case3-1-prometheus-cfg.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
labels:
app: prometheus
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: 'kubernetes-node'
kubernetes_sd_configs: #服务动态发现
- role: node #node节点
relabel_configs:#重写标签配置
- source_labels: [__address__]
regex: '(.*):10250' #node kubelet端口
replacement: '${1}:9100' #地址不动把上面的node端口替换为9100,对9100进行监控抓取数据
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-node-cadvisor-n66'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:8080'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-node-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-apiserver'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces: #可选指定namepace,如果不指定就是发现所有的namespace中的pod
names:
- myserver
- magedu
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
准备工作
[root@k8s-master1 prometheus]# mkdir /data/prometheus
[root@k8s-master1 prometheus]# chmod 777 /data/prometheus
[root@k8s-master1 prometheus]# kubectl create sa monitor -n monitoring #创建账号
serviceaccount/monitor created
[root@k8s-master1 prometheus]# kubectl create clusterrolebinding monitor-clusterrolebinding -n monitoring --clusterrole=cluster-admin --serviceaccount=monitoring:monitor
clusterrolebinding.rbac.authorization.k8s.io/monitor-clusterrolebinding created #对账号授权
yaml部署
[root@k8s-master1 prometheus]# cat case3-2-prometheus-deployment.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
component: server
#matchExpressions:
#- {key: app, operator: In, values: [prometheus]}
#- {key: component, operator: In, values: [server]}
template:
metadata:
labels:
app: prometheus
component: server
annotations:
prometheus.io/scrape: 'false'
spec:
nodeName: 192.168.226.144
serviceAccountName: monitor
containers:
- name: prometheus
image: prom/prometheus:v2.36.1
imagePullPolicy: IfNotPresent
command:
- prometheus
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention=720h
- --web.enable-lifecycle
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: /etc/prometheus/prometheus.yml
name: prometheus-config
subPath: prometheus.yml
- mountPath: /prometheus/
name: prometheus-storage-volume
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
items:
- key: prometheus.yml
path: prometheus.yml
mode: 0644
- name: prometheus-storage-volume
hostPath:
path: /data/prometheusdata
type: Directory
[root@k8s-master1 prometheus]# cat case3-3-prometheus-svc.yaml
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 39090
protocol: TCP
selector:
app: prometheus
component: server
prometheus的服务发现机制
prometheus默认是采用pull方式拉取监控数据的,也就是定时去目标主机上抓取 metrics 数据,每一个被抓取的目标需要暴露一个HTTP接口,prometheus通过这个暴露的接口就可以获取到相应的指标数据,这种方式需要由目标服务决定采集的目标有哪些,通过配置在scrape_configs中的各种job来实现,无法动态感知新服务,如果后面增加了节点或者组件信息,就得手动修promrtheus配置,并重启 promethues,很不方便,所以出现了动态服务发现,动态服务发现能够自动发现集群中的新端点,并加入到配置中,通过服务发现,Prometheus能查询到需要监控的Target列表,然后轮询这些Target获取监控数据。
prometheus获取数据源target的方式有多种,如静态配置和动态服务发现配置,prometheus目前支持的服务发现有很多种,常用的主要分为以下几种:
kubernetes_sd_configs:#基于Kubernetes API实现的服务发现,让 prometheus动态发现kubernetes中被监控的目标
static_configs: #静态服务发现,基于prometheus 配置文件指定的监控目标
dns_sd_configs: #DNS 服务发现监控目标
consul_sd_configs: #Consul服务发现,基于consul服务动态发现监控目标
file_sd_configs:#基于指定的文件实现服务发现,基于指定的文件发现监控目标,自动加载,不需要重启
promethues 的静态静态服务发现static_configs:每当有一个新的目标实例需要监控,都需要手动修改配置文件配置目标target.
promethues的consul服务发现consul.d_configs: Prometheus一直监视consul服务,当发现在consul中注册的服务有变化, prometheus 就会自动监控到所有注册到consul中的目标资源。
promethues 的 kBs服务发现kubernetes_sd.configs: Prometheus 与Kubernetes的API进行交互,动态的发现Kubernetes中部箸的所有可监控的目标资源。
kubernetes_sd_configs
promethues的relabeling(重新修改标签)功能很强大,它能够在抓取到目标实例之前把目标实例的元数据标签动态重新修改,动态添加或者覆盖标签
prometheus 从 Kubernetes APIl动态发现目标(targer)之后,在被发现的target 实例中,都包含一些原始的Metadata标签信息,默认的标签有:
__address__: 以<host>:<port>格式显示目标 targets 的地址
__scheme__:采集的目标服务地址的Scheme形式,HTTP或者HTTPS
__metrics_path__:采集的目标服务的访问路径
基础功能-重新标记目的
为了更好的识别监控指标,便于后期调用数据绘图、告警等需求,prometheus 支持对发现的目标进行label修改,在两个阶段可以重新标记:
relabel_configs : 在对target进行数据采集之前(比如在采集数据之前重新定义标签信息,如目的IP、目的端口等信息),可以使用relabel_configs添加、修改或删除一些标签、也可以只采集特定目标或过滤目标。
metric_relabel_configs:在对target进行数据采集之后,即如果是已经抓取到指标数据时,可以使用metric_relabel_configs做最后的重新标记和过滤。
- job_name: 'kubernetes-apiserver'job名称
kubernetes_sd_configs:#基于kubernetes_sd_configs实现服务发现
- role: endpoints #发现endpoints
scheme: https #当前jod使用的发现协议
tls_config: #证书配置
ca_file:/var/run/secrets/kubernetes.io/serviceaccount/ca.crt #容器里的证书路径
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token#容器里的token路径
relabel_configs:#重新re修改标签 label配置configs
- source_labels: [_meta_kubernetes_namespace,_meta_kubernetes_service_name,meta_kubernetes_endpoint_port_name]#源标签,即对哪些标签进行操作
action: keep #action定义了relabel的具体动作,action支持多种
regex: default;kubernetes;https #发现了default命名空间的kubernetes服务切是https协议
label详解
source_labels:源标签,没有经过relabel处理之前的标签名字
target_label:通过action处理之后的新的标签名字
regex:给定的值或正则表达式匹配,匹配源标签
replacement:通过分组替换后标签(target_label)对应的值
action 详解
replace:替换标鉴值,椎据regex正则匹彰到源标签的值,使用replacement来引用糠达式匹配的分飙
keep: 满足regex正则条件的实例遗行朱集,把source_labels中没有匹配到regex正则内客的Target实锏去掉,即只乘集匹配咸力的实例。
drop:满足regex正则条件的实例不采集,把 source_labels中匹配到regex 正则内容的Target实例丢掉,即只采集没有匹配到的实例。
hashmod:使用hashmod计算 source_labels 的 Hash值并进行对比,基于自定义的模数取模,以实现对目标进行分类、重新赋值等功能:
scrape_configs:
- job_narme: ip _job
relabel_configs:
- source_labels:[ address_]
modulus: 4
target_label:_ip_hash
action:
- source_labels: [ ip_hash]
regex: ^1$
action: keep
labelmap:匹配 regex所有标签名称,然后复制匹配标签的值进行分组,通过replacement分组引用(${1},${2},…)替代
labelkeep:匹配 regex所有标签名称,其它不匹配的标签都将从标签集中删除
labeldrop:匹配 regex所有标签名称,其它匹配的标签都将从标签集中删除
支持的发现目标类型
node
service
pod
endpoint
ingress
Endpointslice #对endpoint进行切片
监控api-server发现
apiserver 作为Kubernetes最核心的组件,它的监控也是非常有必要的,对于apiserver的监控我们可以直接通过kubernetes的service来获取:
- job_name: 'kubernetes-apiserver'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
regex: default;kubernetes;https #含义为namespace为default,svc名称为kubernetes并且协议是https,区配成功后进行保留,并且把regex作为source_labels相对应的值.即labels 为key,regex为值。
label替换方式如下
__meta_kubernetes_namespace=default,__meta_kubernetes_service_name=kubernetes,__metakubernetes_endpoint_port_name=https
最终匹配到apiserver地址
[root@k8s-master1 prometheus]# kubectl get ep
NAME ENDPOINTS AGE
kubernetes 192.168.226.144:6443 61d
api-server指标数据
Apiserver组件是k8s集群的入口,所有请求都是从apiserver进来的,所以对apiserver指标做监控可以用来判断集群的健康状况。
apiserver_request_total
以下promQL语句为查询apiserver最近一分钟不同方法的请求数量统计:apiserver_request_total为请求各个服务的访问详细统计:
irate 和 rate都会用于计算某个指标在一定时间间隔内的变化速率。但是它们的计算方法有所不同: irate 取的是在指定时间范围内的最近两个数据点来算速率,而rate会取指定时间范围内所有数据点,算出一组速率,然后取平均值作为结果。
所以官网文档说: irate适合快速变化的计数器(counter) ,而rate适合缓慢变化的计数器(counter)。
根据以上算法我们也可以理解,对于快速变化的计数器,如果使用rate,因为使用了平均值,很容易把峰直削平。除非我们把时间间隔设置得足够小,就能够减弱这种效应。
rate(apiserver_request_total{code=~"^(?:2..)$"}[5m])
irate(apiserver_request_total{code=~"^(?:2..)$"}[5m])
关于annotation_prometheus_io_scrape
在k8s中,基于prometheus 的发现规则,需要在被发现的目的target定义注解匹配annotation_prometheus_io_scrape=true,且必须匹配成功该注解才会保留监控target,然后再进行数据抓取并进行标签替换,如 annotation_prometheus_io_scheme标签为http 或https:
- job_name: 'kubernetes-service-endpoints' #job名称
kubernetes_sd_configs: #sd_configs发现
- role: endpoints #角色endpoints发现
relabel_configs: #标签重写配置
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true #值为true时保留
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__ #标签替换为__scheme__
regex: (https?) #正则匹配http或者https
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__#标签替换为__metrics_path__
regex: (.+) #匹配路径为1到任意长度
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__ 替换标签为__address__
regex: ([^:]+)(?::\d+)?;(\d+) #匹配地址:端口
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+) #正则匹配servicename
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace #替换标签并显示
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_name
kube-dns发现
[root@k8s-master1 prometheus]# kubectl describe svc kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: addonmanager.kubernetes.io/mode=Reconcile
k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
Annotations: prometheus.io/port: 9153 #注解标签用于发现端口
prometheus.io/scrape: true #允许抓取数据
Selector: k8s-app=kube-dns
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.100.0.2
IPs: 10.100.0.2
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 10.200.36.113:53,10.200.36.122:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 10.200.36.113:53,10.200.36.122:53
Port: metrics 9153/TCP
TargetPort: 9153/TCP
Endpoints: 10.200.36.113:9153,10.200.36.122:9153
Session Affinity: None
Events: <none>
node节点发现及指标
- job_name: 'kubernetes-node'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250' #正则匹配10250的端口(kubelet)
replacement: '${1}:9100' #替换为9100端口
target_label: __address__
action: replace #赋值给target_label
- action: labelmap
regex: __meta_kubernetes_node_label_(.+) #发现新的lable并用新的service name作为label、将发现的值依然新的label的值
常见的node节点指标
node_cpu_: CPU相关指标
node_load1 : load average #系统负载指标
node_memory_:内存相关指标
node_network_:网络相关指标
node_boot_time_seconds:系统启动时间监控
go_*: node exporte运行过程中go相关指标
process_*: node exporter运行时进程内部进程指标
Pod发现
配置
- job_name: 'kubernetes-pods'
kubernetes_sd_configs:
- role: pod
namespaces: #可选指定namepace,如果不指定就是发现所有的namespace中的pod
names:
- myserver
- magedu
- monitoring
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
- job_name: 'kubernetes-node-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt #默认证书路径
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token #默认token路径
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443 ##replacement指定的替换后的标(target_label)对应的值为kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__ #重写
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
注意:tls_config配置的证书地址是每个Pod连接apiserver 所使用的地址,无论证书是否用得上,在Pod启动的时候kubelet都会给每一个pod自动注入 ca的公钥,即所有的pod启动的时候都会有一个ca公钥被注入进去用于在访问apiserver的时候被调用。
pod监控指标
sum(rate(container_cpu_usage_seconds_total{imagel=""}[1m])) without(instance)
sum(rate(container_memory_usage_bytes{image!=""}][1m])) without (instance)
sum(rate(container_fs_io_current{imagel=""}[1m])) without (device)
sum(rate(container_fs_writes_bytes_total{imagel=""}[1m])) without (device)
sum(rate(container_fs_reads_bytes_total{fimagel=""}[1m])) without (device)
sum(rate(container_network_receive_bytes_total{imagel=""}[1m)) without (interface)
虚拟机部署Prometheus发现k8s中pod
获取token
[root@k8s-master1 prometheus]# kubectl get sa -n monitoring
NAME SECRETS AGE
default 1 45d
monitor 1 2d1h
[root@k8s-master1 prometheus]# kubectl get sa monitor -n monitoring -o yaml
apiVersion: v1
kind: ServiceAccount
metadata:
creationTimestamp: "2022-06-15T14:17:04Z"
name: monitor
namespace: monitoring
resourceVersion: "625848"
uid: 8e3b8beb-c6a8-4826-8d13-9a3df617ae2c
secrets:
- name: monitor-token-vnj95
[root@k8s-master1 prometheus]# kubectl describe secrets monitor-token-vnj95 -n monitoring
Name: monitor-token-vnj95
Namespace: monitoring
Labels: <none>
Annotations: kubernetes.io/service-account.name: monitor
kubernetes.io/service-account.uid: 8e3b8beb-c6a8-4826-8d13-9a3df617ae2c
Type: kubernetes.io/service-account-token
Data
====
ca.crt: 1302 bytes
namespace: 10 bytes
token: eyJhbGciOiJSUzI1NiIsImtpZCI6IlJjVXRjX0FEemIxb0Y0OFIyOU03OFEyNkxNbUJNOV9JS25JNmtXbFBKeWsifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJtb25pdG9yaW5nIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZWNyZXQubmFtZSI6Im1vbml0b3ItdG9rZW4tdm5qOTUiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoibW9uaXRvciIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VydmljZS1hY2NvdW50LnVpZCI6IjhlM2I4YmViLWM2YTgtNDgyNi04ZDEzLTlhM2RmNjE3YWUyYyIsInN1YiI6InN5c3RlbTpzZXJ2aWNlYWNjb3VudDptb25pdG9yaW5nOm1vbml0b3IifQ.Dn_-RFFDT-SBpM5txQFQZv0XoKZWBOHo--eJWhLsDGAeMPFtqQ-aOJP5tZVuGOv2TzC8UrNFq6BtQtt5RddkH4tK9O0DCHRx8JWnh8Lvn347z7nU179zge7hg3OuhmFKyLw6AsE0DhP_3picDgawUnSoXD1FqW9SEcyY75IiLf0MgxhkU4JjXLxwLRqWdHqk4QZSUthSm9Vfbgeq1BhhKYPYc_D579bWg6hisGp107oxFTj-Q13Rf9vpiBLMx4OJcJwcrZNW9SOTjeobFUOSLZjCqSfiO3avN4QVVIP9HiDOmZXNtuvbd1fnawSEILc07hRSts5VEaZA9eA_VLKf0g
prometheus添加 job
- job_name: 'kubernetes-发现指定namespace的所有pod'
kubernetes_sd_configs:
- role: pod
api_server: https://192.168.226.144:6443
tls_config:
insecure_skip_verify: true
bearer_token_file: /apps/prometheus/k8s.token
namespaces:
names:
- myserver
- magedu
- monitoring
- kubernetes-dashboard
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
重启验证
Prometheus静态配置static_configs
- job_name: "prometheus-node"
static_configs:
- targets: ["192.168.226.145:9100","192.168.226.152:9100","192.168.226.146:9100","192.168.226.144:9100"]
consul_sd_configs
Consul是分布式k/数据存储集群,目前常用于服务的服务注册和发现。Pod将信息注册到consul,Prometheus会定期从consul中获取信息。
部署consul并验证集群
wget https://releases.hashicorp.com/consul/1.12.2/consul_1.12.2_linux_amd64.zip
nohup consul agent -server -bootstrap -bind=192.168.226.152 -client=192.168.226.152 -data-dir=/data/consul -ui -node=192.168.226.152 &
nohup consul agent -bind=192.168.226.144 -client=192.168.226.144 -data-dir=/data/consul -node=192.168.226.144 -join=192.168.226.152 &
测试注册服务
curl -X PUT -d '{"id": "node-exporter144","name": "k8s-node-exporter144","address": "192.168.226.144","port":9100,"tags": ["node-exporter"],"checks": [{"http": "http://192.168.226.144:9100/","interval": "5s"}]}' http://192.168.226.144:8500/v1/agent/service/register
curl -X PUT -d '{"id": "node-exporter152","name": "k8s-node-exporter152","address": "192.168.226.152","port":9100,"tags": ["node-exporter"],"checks": [{"http": "http://192.168.226.152:9100/","interval": "5s"}]}' http://192.168.226.152:8500/v1/agent/service/register
在Prometheus中配置consul来发现服务
- job_name: consul
honor_labels: true
metrics_path: /metrics
scheme: http
consul_sd_configs:
- server: 192.168.226.152:8500
services: [] #发现的目标服务名称,空为所有服务,可以写servicea,servcieb,servicec
- server: 192.168.226.144:8500
services: []
relabel_configs:
- source_labels: ['__meta_consul_tags']
target_label: 'product'
- source_labels: ['__meta_consul_dc']
target_label: 'idc'
- source_labels: ['__meta_consul_service']
regex: "consul"
action: drop
服务删除
curl --request PUT http://192.168.226.152:8500/v1/agent/service/deregister/node-exporter152
file_sd_configs 文件的服务发现
创建json文件
[root@lvs-backup prometheus]# vim file_sd/sd_myserver.json
[
{
"targets": ["192.168.226.144:9100","192.168.226.145:9100","192.168.226.146:9100"]
}
]
配置Prometheus调用json文件
- job_name: 'file_sd_my_server'
file_sd_configs:
- files:
- /apps/prometheus/file_sd/sd_myserver.json
refresh_interval: 10s #检查周期
验证
DNS服务发现
基于DNS的服务发现允许配置指定一组DNS域名,这些域名会定期查询以发现目标列表,域名需要可以被配置的DNS服务器解析为IP。
此服务发现方法仅支持基本的DNS A、AAAA和SRV记录查询。A记录:域名解析为IP
SRV:SRV记录了哪台计算机提供了具体哪个服务。格式为:自定义的服务的名字.势议的类型.城名(例加:_example-server._tcp.www.mydns.com)
配置
- job_name: 'dns-server-name-monitor'
metrics_path: "/metrics"
dns_sd_configs:
- names: ["www,baidutest.com", "www,huaweitest.com"]
type: A
port: 6010
kube-state-metrics组件介绍
Kube-state-metrics:通过监听API Server生成有关资源对象的状态指标,比如 Deployment、Node.Pod,需要注意的是kube-state-metrics只是简单的提供一个metrics数据,并不会存储这些指标数据,所以我们可以使用Prometheus来抓取这些数据然后存储,主要关注的是业务相关的一些元数据,比如Deployment、Pod、副本状态等,调度了多少个replicas?现在可用的有几个?多少个 Pod是running/stopped/terminated 状态?Pod重启了多少次?目前有多少job在运行中。
部署Kube-state-metrics
[root@k8s-master1 prometheus]# cat case5-kube-state-metrics-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kube-state-metrics
template:
metadata:
labels:
app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: bitnami/kube-state-metrics:2.5.0
ports:
- containerPort: 8080
---
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
verbs: ["list", "watch"]
- apiGroups: ["extensions"]
resources: ["daemonsets", "deployments", "replicasets"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources: ["cronjobs", "jobs"]
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
name: kube-state-metrics
namespace: kube-system
labels:
app: kube-state-metrics
spec:
type: NodePort
ports:
- name: kube-state-metrics
port: 8080
targetPort: 8080
nodePort: 31666
protocol: TCP
selector:
app: kube-state-metrics
验证
添加到Prometheus
- job_name: "prometheus-kube-state-metrics"
static_configs:
- targets: ["192.168.226.146:31666"]
grafana导入模板
监控tomcat
基于第三方exporter实现对目的服务的监控,然后Prometheus读取svc中数据进行展示。不过多个pod的情况会漏数据,还是通过api服务发现统计准确
构建镜像
[root@k8s-master1 tomcat-image]# cat Dockerfile
#FROM tomcat:8.5.73-jdk11-corretto
FROM tomcat:8.5.73
LABEL maintainer="jack 2973707860@qq.com"
ADD server.xml /usr/local/tomcat/conf/server.xml
RUN mkdir /data/tomcat/webapps -p
ADD myapp /data/tomcat/webapps/myapp
ADD metrics.war /data/tomcat/webapps
ADD simpleclient-0.8.0.jar /usr/local/tomcat/lib/
ADD simpleclient_common-0.8.0.jar /usr/local/tomcat/lib/
ADD simpleclient_hotspot-0.8.0.jar /usr/local/tomcat/lib/
ADD simpleclient_servlet-0.8.0.jar /usr/local/tomcat/lib/
ADD tomcat_exporter_client-0.0.12.jar /usr/local/tomcat/lib/
#ADD run_tomcat.sh /apps/tomcat/bin/
EXPOSE 8080 8443 8009
CMD ["/usr/local/tomcat/bin/catalina.sh","run"]
#CMD ["/apps/tomcat/bin/run_tomcat.sh"]
创建pod并验证
[root@k8s-master1 yaml]# cat tomcat-deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: tomcat-deployment
namespace: default
spec:
selector:
matchLabels:
app: tomcat
replicas: 1 # tells deployment to run 2 pods matching the template
template: # create pods using pod definition in this template
metadata:
labels:
app: tomcat
annotations:
prometheus.io/scrape: 'true'
spec:
containers:
- name: tomcat
image: k8s-harbor.com/public/tomcat-app1:v0618
ports:
- containerPort: 8080
securityContext:
privileged: true
[root@k8s-master1 yaml]# cat tomcat-svc.yaml
kind: Service #service 类型
apiVersion: v1
metadata:
annotations:
prometheus.io/scrape: 'true'
name: tomcat-service
spec:
selector:
app: tomcat
ports:
- nodePort: 31080
port: 80
protocol: TCP
targetPort: 8080
type: NodePort
配置Prometheus收集数据
- job_name: "prometheus-tomcat-metrics"
static_configs:
- targets: ["192.168.226.146:31080"]
配置grafana模板
wget https://github.com/nlighten/tomcat_exporter/blob/master/dashboard/example.json
监控redis
https://github.com/oliver006/redis_exporter
通过redis_exporter监控redis 服务状态。
部署redis
[root@k8s-master1 yaml]# cat redis-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis
namespace: studylinux-net
spec:
replicas: 1
selector:
matchLabels:
app: redis
template:
metadata:
labels:
app: redis
spec:
containers:
- name: redis
image: redis:4.0.14
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 6379
- name: redis-exporter
image: oliver006/redis_exporter:latest
resources:
requests:
cpu: 100m
memory: 100Mi
ports:
- containerPort: 9121
[root@k8s-master1 yaml]# cat redis-exporter-svc.yaml
kind: Service #service 类型
apiVersion: v1
metadata:
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: "9121"
name: redis-exporter-service
namespace: studylinux-net
spec:
selector:
app: redis
ports:
- nodePort: 31082
name: prom
port: 9121
protocol: TCP
targetPort: 9121
type: NodePort
[root@k8s-master1 yaml]# cat redis-redis-svc.yaml
kind: Service #service 类型
apiVersion: v1
metadata:
annotations:
prometheus.io/scrape: 'true'
name: redis-redis-service
namespace: studylinux-net
spec:
selector:
app: redis
ports:
- nodePort: 31081
name: redis
port: 6379
protocol: TCP
targetPort: 6379
type: NodePort
验证
配置Prometheus收集数据
- job_name: "prometheus-redis-metrics"
static_configs:
- targets: ["192.168.226.146:31082"]
grafana导入模板
mysql监控
创建用户并验证
CREATE USER 'mysql_exporter'@'localhost' IDENTIFIED BY 'imnot007*';
GRANT PROCESS, REPLICATION CLIENT,SELECT ON *.* TO 'mysql_exporter'@'localhost';
root@lvs-backup mysql-5.6.43-onekey-install]# mysql -umysql_exporter -pimnot007* -hlocalhost
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 6
Server version: 5.6.43 MySQL Community Server (GPL)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]> Bye
配置免密登录
[root@lvs-backup prometheus]# cat /root/.my.cnf
[client]
user=mysql_exporter
password=imnot007*
下载mysqld_exporter并验证
https://github.com/prometheus/mysqld_exporter/releases/download/v0.14.0/mysqld_exporter-0.14.0.freebsd-amd64.tar.gz
/usr/local/bin/mysqld_exporter-0.14.0.linux-amd64/mysqld_exporter --config.my-cnf=/root/.my.cnf
配置Prometheus收集数据
- job_name: "prometheus-mysql-metrics"
static_configs:
- targets: ["192.168.226.152:9104"]
grafana导入模板
监控haproxy
部署haproxy
yum install haproxy
vim /etc/haproxy/haproxy.cfg
stats socket /run/haproxy/admin.sock mode 660 level admin
listen k8s_service_ng_6666
bind 192.168.226.152:80
mode tcp
server node1 192.168.226.152:9090 check inter 2000 fall 3 rise 5
部署并启动haproxy_exporter
https://github.com/prometheus/mysqld_exporter/releases
./haproxy_exporter --haproxy.scrape-uri=unix:/var/lib/haproxy/admin.sock
配置Prometheus收集数据
- job_name: "prometheus-haproxy-metrics"
static_configs:
- targets: ["192.168.226.152:9101"]
grafana导入模板
监控nginx
通过 nginx-module-vts将nginx的数据解析成Prometheus能够读取的数据
获取 nginx-module-vts
git clone https://github.com/vozlt/nginx-module-vts.git
编译nginx并启动
./configure --prefix=/apps/nginx \
--with-pcre \
--with-http_ssl_module \
--with-http_v2_module \
--with-http_realip_module \
--with-http_gzip_static_module \
--with-http_stub_status_module \
--with-threads \
--with-file-aio \
--with-stream \
--with-stream_ssl_module \
--with-stream_realip_module \
--add-module=/usr/local/bin/nginx-module-vts/
make
make install
修改nginx配置设置状态页
vim /apps/nginx/conf/nginx.conf
#gzip on;
vhost_traffic_status_zone;
location /status {
vhost_traffic_status_display;
vhost_traffic_status_display_format html;
}
验证
已转换成Prometheus能够读取的数据格式,需要exporter进行提取给Prometheus
安装nginx exporter
https://github.com/hnlq715/nginx-vts-exporter/releases/download/v0.10.3/nginx-vts-exporter-0.10.3.linux-amd64.tar.gz
./nginx-vts-exporter -nginx.scrape_uri http://192.168.226.152/status/format/json
配置Prometheus收集数据
- job_name: "prometheus-nginx-metrics"
static_configs:
- targets: ["192.168.226.152:9913"]
grafana导入模板
blackbox_exporter监控
blackbox_exporter是Prometheus官方提供的exporter,可以通过http(可用性检测),https(可用性检测),dns(域名解析),tcp(端口监听检测)和icmp(主机存活检测)对被监控节点进行监控和数据采集。
部署blackbox_exporter并启动
./blackbox_exporter --config.file=/usr/local/bin/blackbox_exporter-0.21.0.linux-amd64/blackbox.yml --web.listen-address=:9115
blackbox_exporter实现url监控
Prometheus中添加job
Prometheus将监控对象传给blackbox,blackbox拿着监控对象去采集数据。
- job_name: 'http_status'
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets: ['http://www.xiaomi.com', 'https://consumer.huawei.com']
labels:
instance: http_status
group: web
relabel_configs:
- source_labels: [__address__] #relabel通过将__address__(当前目标地址)写入__param_target标签来创建一个label。
target_label: __param_target #监控目标www.xiaomi.com,作为__address__的value
- source_labels: [__param_target] #监控目标
target_label: url #将监控目标与url创建一个label
- target_label: __address__
replacement: 192.168.226.152:9115
验证
blackbox_exporter实现icmp监控
- job_name: 'ping_status'
metrics_path: /probe
params:
module: [icmp]
static_configs:
- targets: ['172.31.0.2',"223.6.6.6"]
labels:
instance: 'ping_status'
group: 'icmp'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: ip #将ip与__param_target创建一个label
- target_label: __address__
replacement: 192.168.226.152:9115
blackbox_exporter实现端口监控
- job_name: 'port_status'
metrics_path: /probe
params:
module: [tcp_connect]
static_configs:
- targets: ['192.168.226.152:9090', 'http://www.xiaomi.com:80']
labels:
instance: 'port_status'
group: 'port'
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: ip
- target_label: __address__
replacement: 192.168.226.152:9115
grafana导入模板
更多推荐
所有评论(0)