K8s环境下部署Prometheus、node_export、cadvisor、kube-state-metrics、Grafana-Day02
主要介绍如何在k8s环境中部署prometheus及其相关的组件
1. cadvisor介绍
官网:https://github.com/google/cadvisor
监控容器指标数据需要使用cadvisor,cadvisor由谷歌开源,在kubernetes v1.11及之前的版本内置在kubelet中并监听在4194端口(https://github.com/kubernetes/kubernetes/pull/65707),从v1.12开始kubelet中的cadvisor被移除,因此需要单独通过daemonset等方式部署。
cadvisor(容器顾问)不仅可以收集一台机器上所有运行的容器信息,还提供基础查询界面和http接口,方便其他组件如Prometheus进行数据抓取,cAdvisor可以对节点机器上的容器进行实时监控和性能数据采集,包括容器的CPU使用情况、内存使用情况、网络吞吐量及文件系统使用情况。
2. cadvisor常用指标数据介绍
指标名称 | 类型 | 含义 |
---|---|---|
container_cpu_load_average_10s | gauge | 过去10秒容器CPU的平均负载 |
container_cpu_usage_seconds_total | counter | 容器在每个CPU内核上的累积占用时间 (单位:秒) |
container_cpu_system_seconds_total | counter | System CPU累积占用时间(单位:秒) |
container_cpu_user_seconds_total | counter | User CPU累积占用时间(单位:秒) |
container_fs_usage_bytes | gauge | 容器中文件系统的使用量(单位:字节) |
container_fs_limit_bytes | gauge | 容器可以使用的文件系统总量(单位:字节) |
container_fs_reads_bytes_total | counter | 容器累积读取数据的总量(单位:字节) |
container_fs_writes_bytes_total | counter | 容器累积写入数据的总量(单位:字节) |
container_memory_max_usage_bytes | gauge | 容器的最大内存使用量(单位:字节) |
container_memory_usage_bytes | gauge | 容器当前的内存使用量(单位:字节) |
container_spec_memory_limit_bytes | gauge | 容器的内存使用量限制 |
machine_memory_bytes | gauge | 当前主机的内存总量 |
container_network_receive_bytes_total | counter | 容器网络累积接收数据总量(单位:字节) |
container_network_transmit_bytes_total | counter | 容器网络累积传输数据总量(单位:字节) |
2.1 查询语句示例
(1)获取容器CPU使用率
sum(irate(container_cpu_usage_seconds_total{image!=""}[1m])) without (cpu)
(2)查询容器内存使用量(单位:字节)
container_memory_usage_bytes{image!=""}
(3)查询容器网络接收量(速率)(单位:字节/秒)
sum(rate(container_network_receive_bytes_total{image!=""}[1m])) without(interface)
(4)容器网络传输量 字节/秒
sum(rate(container_network_transmit_bytes_total{image!=""}[1m])) without(interface)
(5)容器文件系统读取速率 字节/秒
sum(rate(container_fs_reads_bytes_total{image!=""}[1m])) without (device)
(6)容器文件系统写入速率 字节/秒
sum(rate(container_fs_writes_bytes_total{image!=""}[1m])) without (device)
2.2 cadvisor 常用容器监控指标
2.2.1 监控网络流量
(1)容器网络接收的字节数(1分钟内),根据名称查询 name=~".+"
sum(rate(container_network_receive_bytes_total{name=~".+"}[1m])) by (name)
(2)容器网络传输的字节数(1分钟内),根据名称查询 name=~".+"
sum(rate(container_network_transmit_bytes_total{name=~".+"}[1m])) by (name)
2.2.2 监控容器 CPU相关
(1)所用容器system cpu的累计使用时间(1min内)
sum(rate(container_cpu_system_seconds_total[1m]))
(2)每个容器system cpu的使用时间(1min内)
sum(irate(container_cpu_system_seconds_total{image!=""}[1m])) without (cpu)
(3)每个容器的cpu使用率
sum(rate(container_cpu_usage_seconds_total{name=~".+"}[1m])) by (name) * 100
(4)总容器的cpu使用率
sum(sum(rate(container_cpu_usage_seconds_total{name=~".+"}[1m])) by (name) * 100)
3. Docker部署cadvisor
3.1 使用docker运行cadvisor
使用docker部署时,注意查看官网的注意事项,不同的操作系统安装都不太相同,我这里是centos
[root@k8s-node1 ~]# docker run -d --volume=/:/rootfs:ro --volume=/var/run:/var/run:ro --volume=/sys:/sys:ro --volume=/var/lib/docker/:/var/lib/docker:ro --volume=/dev/disk/:/dev/disk:ro --volume=$GOPATH/src/github.com/google/cadvisor/perf/testing:/etc/configs/perf --volume=/cgroup:/cgroup:ro --publish=8080:8080 --device=/dev/kmsg --name=cadvisor --privileged=true google/cadvisor:v0.33.0
[root@k8s-node1 ~]# docker ps -l
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7bd371ee79ca google/cadvisor:v0.33.0 "/usr/bin/cadvisor -…" 3 seconds ago Up 2 seconds (health: starting) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp cadvisor
3.2 访问cadvisor web页面
通过web页面就能看到宿主机上所有容器相关的指标了
4. k8s中部署cadvisor
4.1 删除docker方式部署的cadvisor
[root@k8s-node1 ~]# docker ps -l
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
7bd371ee79ca google/cadvisor:v0.33.0 "/usr/bin/cadvisor -…" 2 hours ago Up 2 hours (healthy) 0.0.0.0:8080->8080/tcp, :::8080->8080/tcp cadvisor
[root@k8s-node1 ~]# docker rm -f cadvisor
cadvisor
4.2 通过daemonset部署cadvisor
官方文档:https://github.com/google/cadvisor/tree/master/deploy/kubernetes/base
4.2.1 编辑yaml文件
[root@k8s-master1 ~]# mkdir yaml/monitor
[root@k8s-master1 ~]# cd yaml/monitor
[root@k8s-master1 monitor]# cat serviceaccount.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: cadvisor
namespace: monitoring
[root@k8s-master1 monitor]# cat daemonset-cadvisor.yaml
apiVersion: apps/v1 # for Kubernetes versions before 1.9.0 use apps/v1beta2
kind: DaemonSet
metadata:
name: cadvisor
namespace: monitoring
annotations:
seccomp.security.alpha.kubernetes.io/pod: 'docker/default'
spec:
selector:
matchLabels:
app: cadvisor
template:
metadata:
labels:
app: cadvisor
spec:
tolerations: # 配置污点容忍,让master节点上也运行cadvisor
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
hostNetwork: true # 使用宿主机网络,这样就不用配置service了
serviceAccountName: cadvisor
containers:
- name: cadvisor
#image: gcr.io/cadvisor/cadvisor:v0.45.0 # 官网提供的镜像,国内下不下来
image: google/cadvisor:v0.33.0
resources:
requests:
memory: 400Mi
cpu: 400m
limits:
memory: 2000Mi
cpu: 800m
volumeMounts:
- name: rootfs
mountPath: /rootfs
readOnly: true
- name: var-run
mountPath: /var/run
readOnly: true
- name: sys
mountPath: /sys
readOnly: true
- name: docker
mountPath: /var/lib/docker # 注意这里,如果使用非docker容器引擎,就要改
readOnly: true
- name: disk
mountPath: /dev/disk
readOnly: true
ports:
- name: http
containerPort: 8080
protocol: TCP
automountServiceAccountToken: false
terminationGracePeriodSeconds: 30
volumes:
- name: rootfs
hostPath:
path: /
- name: var-run
hostPath:
path: /var/run
- name: sys
hostPath:
path: /sys
- name: docker
hostPath:
path: /var/lib/docker # 注意这里,如果使用非docker容器引擎,就要改
- name: disk
hostPath:
path: /dev/disk
4.2.2 部署cadvisor
[root@k8s-master1 monitor]# kubectl create ns monitoring
namespace/monitoring created
[root@k8s-master1 monitor]# kubectl apply -f .
daemonset.apps/cadvisor created
serviceaccount/cadvisor created
[root@k8s-master1 monitor]# kubectl get sa -n monitoring
NAME SECRETS AGE
cadvisor 0 98s
default 0 97m
[root@k8s-master1 monitor]# kubectl get po -n monitoring
NAME READY STATUS RESTARTS AGE
cadvisor-fkt9g 1/1 Running 0 75s
cadvisor-ft2zz 1/1 Running 0 75s
cadvisor-pbv2l 1/1 Running 0 75s
4.2.3 访问测试
5. k8s中部署node_export
5.1 编辑yaml
[root@k8s-master1 monitor]# cat node_export.yaml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: monitoring
labels:
k8s-app: node-exporter
spec:
selector:
matchLabels:
k8s-app: node-exporter
template:
metadata:
labels:
k8s-app: node-exporter
spec:
tolerations:
- key: node-role.kubernetes.io/control-plane
operator: Exists
effect: NoSchedule
- key: node-role.kubernetes.io/master
operator: Exists
effect: NoSchedule
containers:
- image: bitnami/node-exporter:1.7.0
imagePullPolicy: IfNotPresent
name: prometheus-node-exporter
ports:
- containerPort: 9100
hostPort: 9100
protocol: TCP
name: metrics
volumeMounts:
- mountPath: /host/proc
name: proc
- mountPath: /host/sys
name: sys
- mountPath: /host
name: rootfs
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
- --path.rootfs=/host
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
- name: rootfs
hostPath:
path: /
hostNetwork: true # 使用宿主机网络和PID
hostPID: true
5.2 部署并访问
[root@k8s-master1 monitor]# kubectl get po -n monitoring|grep node
node-exporter-sdgbp 1/1 Running 0 99s
node-exporter-t28mj 1/1 Running 0 99s
node-exporter-wgr2c 1/1 Running 0 99s
6. k8s中部署Prometheus
6.1 准备nfs存储Prometheus数据
6.1.1 安装nfs
[root@k8s-master1 ~]# yum -y install nfs-utils rpcbind
[root@k8s-master1 ~]# systemctl start rpcbind.service
[root@k8s-master1 ~]# systemctl enable rpcbind.service
[root@k8s-master1 ~]# rpcinfo -p 127.0.0.1
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
[root@k8s-master1 ~]# systemctl start nfs
[root@k8s-master1 ~]# systemctl enable nfs
Created symlink from /etc/systemd/system/multi-user.target.wants/nfs-server.service to /usr/lib/systemd/system/nfs-server.service.
[root@k8s-master1 ~]# rpcinfo -p 127.0.0.1
program vers proto port service
100000 4 tcp 111 portmapper
100000 3 tcp 111 portmapper
100000 2 tcp 111 portmapper
100000 4 udp 111 portmapper
100000 3 udp 111 portmapper
100000 2 udp 111 portmapper
100024 1 udp 35042 status
100024 1 tcp 53465 status
100005 1 udp 20048 mountd
100005 1 tcp 20048 mountd
100005 2 udp 20048 mountd
100005 2 tcp 20048 mountd
100005 3 udp 20048 mountd
100005 3 tcp 20048 mountd
100003 3 tcp 2049 nfs
100003 4 tcp 2049 nfs
100227 3 tcp 2049 nfs_acl
100003 3 udp 2049 nfs
100003 4 udp 2049 nfs
100227 3 udp 2049 nfs_acl
100021 1 udp 46538 nlockmgr
100021 3 udp 46538 nlockmgr
100021 4 udp 46538 nlockmgr
100021 1 tcp 41226 nlockmgr
100021 3 tcp 41226 nlockmgr
100021 4 tcp 41226 nlockmgr
6.1.2 编辑配置文件并创建数据目录
[root@k8s-master1 ~]# cat /etc/exports
/data 10.31.200.0/24(rw,sync)
[root@k8s-master1 ~]# mkdir /data
[root@k8s-master1 ~]# chown -R nfsnobody. /data
[root@k8s-master1 ~]# systemctl reload nfs
6.1.3 测试挂载
[root@k8s-master1 ~]# showmount -e 10.31.200.100
Export list for 10.31.200.100:
/data 10.31.200.0/24
[root@k8s-master1 data]# mount -t nfs 10.31.200.100:/data /mnt
[root@k8s-master1 data]# df -h|grep mnt
10.31.200.100:/data 50G 5.3G 45G 11% /mnt
[root@k8s-master1 data]# umount /mnt
6.1.4 创建Prometheus数据目录并授权
[root@k8s-master1 data]# mkdir -p /data/k8s_data/prometheus
# Prometheus启动后,是用65534这个主和组来运行的,所以要给nfs对应的目录也授权下,不然没有写入权限
# 授权:chown -R 65534.65534 /data/k8s_data/prometheus
# 或者chmox 777 /data/k8s_data/prometheus,也行
# 我使用的镜像是bitnami/prometheus,运行用户ID为1001
# 创建Prometheus程序专用用户,id要和pod中的id相同,并加入nfs目录用户所在的组
[root@k8s-master1 prometheus]# useradd prometheus -u 1001 -g nfsnobody
# 修改目录权限,允许属组用户写入
[root@k8s-master1 data]# cd k8s_data/
[root@k8s-master1 k8s_data]# chmod 775 prometheus
6.1.5 所有k8s机器安装nfs客户端命令
yum -y install nfs-utils
systemctl start nfs-utils
systemctl enable nfs-utils
6.2 创建Prometheus专用的监控账号
Prometheus会调用k8s api,所以需要创建一个对应权限的sa
[root@k8s-master1 data]# kubectl create sa prometheus -n monitoring
serviceaccount/prometheus created
[root@k8s-master1 data]# kubectl get sa -n monitoring
NAME SECRETS AGE
cadvisor 0 81m
default 0 177m
prometheus 0 4s
[root@k8s-master1 data]# kubectl create clusterrolebinding prometheus-clusterrolebinding -n monitoring --clusterrole=cluster-admin --serviceaccount=monitoring:prometheus
clusterrolebinding.rbac.authorization.k8s.io/prometheus-clusterrolebinding created
6.3 编辑yaml
6.3.1 Prometheus cm
[root@k8s-master1 data]# cd /root/yaml/monitor/
[root@k8s-master1 monitor]# mkdir prometheus
[root@k8s-master1 monitor]# cd prometheus
[root@k8s-master1 prometheus]#
[root@k8s-master1 prometheus]# cat cm.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
labels:
app: prometheus
name: prometheus-config
namespace: monitoring
data:
prometheus.yml: |
global:
scrape_interval: 15s
scrape_timeout: 10s
evaluation_interval: 1m
scrape_configs:
- job_name: 'kubernetes-node'
kubernetes_sd_configs:
- role: node
relabel_configs:
- source_labels: [__address__]
regex: '(.*):10250'
replacement: '${1}:9100'
target_label: __address__
action: replace
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- job_name: 'kubernetes-node-cadvisor'
kubernetes_sd_configs:
- role: node
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- action: labelmap
regex: __meta_kubernetes_node_label_(.+)
- target_label: __address__
replacement: kubernetes.default.svc:443
- source_labels: [__meta_kubernetes_node_name]
regex: (.+)
target_label: __metrics_path__
replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
- job_name: 'kubernetes-apiserver'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- job_name: 'kubernetes-service-endpoints'
kubernetes_sd_configs:
- role: endpoints
relabel_configs:
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
action: keep
regex: true
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
action: replace
target_label: __scheme__
regex: (https?)
- source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: (.+)
- source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
action: replace
target_label: __address__
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
- action: labelmap
regex: __meta_kubernetes_service_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_service_name]
action: replace
target_label: kubernetes_service_name
6.3.2 Prometheus部署yaml
[root@k8s-master1 prometheus]# cat deploy.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
namespace: monitoring
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
component: server
#matchExpressions:
#- {key: app, operator: In, values: [prometheus]}
#- {key: component, operator: In, values: [server]}
template:
metadata:
labels:
app: prometheus
component: server
annotations:
prometheus.io/scrape: 'false'
spec:
serviceAccountName: prometheus
containers:
- name: prometheus
image: bitnami/prometheus:2.50.1
imagePullPolicy: IfNotPresent
command:
- prometheus
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention=720h
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: /etc/prometheus/prometheus.yml
name: prometheus-config
subPath: prometheus.yml
- mountPath: /prometheus/
name: prometheus-storage-volume
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
items:
- key: prometheus.yml
path: prometheus.yml
mode: 0644
- name: prometheus-storage-volume
nfs:
server: 10.31.200.100
path: /data/k8s_data/prometheus
6.3.3 Prometheus service yaml
[root@k8s-master1 prometheus]# cat service.yaml
---
apiVersion: v1
kind: Service
metadata:
name: prometheus
namespace: monitoring
labels:
app: prometheus
spec:
type: NodePort
ports:
- port: 9090
targetPort: 9090
nodePort: 30090
protocol: TCP
selector:
app: prometheus
component: server
6.4 部署Prometheus
[root@k8s-master1 prometheus]# kubectl apply -f .
configmap/prometheus-config created
deployment.apps/prometheus-server created
service/prometheus created
[root@k8s-master1 prometheus]# kubectl get cm,po,svc -n monitoring|grep prometheus
configmap/prometheus-config 1 27s
pod/prometheus-server-65688779d8-7j65t 1/1 Running 0 27s
service/prometheus NodePort 10.200.16.149 <none> 9090:30090/TCP 27s
[root@k8s-master1 prometheus]# ll /data/k8s_data/prometheus/
总用量 4
drwxr-xr-x 2 prometheus nfsnobody 6 2月 27 18:28 chunks_head
-rw-r--r-- 1 prometheus nfsnobody 0 2月 27 18:38 lock
-rw-r--r-- 1 prometheus nfsnobody 20001 2月 27 18:38 queries.active
drwxr-xr-x 2 prometheus nfsnobody 54 2月 27 18:38 wal
6.5 访问web页面
7. k8s中部署kube-state-metrics
7.1 kube-state-metrics介绍
(1)Kube-state-metrics通过监听API Server生成有关资源对象的状态指标,比如Service、Deployment、Node、Pod等。
(2)需要注意的是kube- state-metrics的使用场景不是用于监控对方是否存活,而是用于周期性获取目标对象的metrics指标数据并在web界面进行显示或被prometheus抓取,如pod的状态是running还是Terminating、pod的创建时间等、Deployment、Pod、副本状态等,调度了多少个replicas?现在可用的有几个?多少个Pod是running/stopped/terminated状态?Pod重启了多少次? 目前有多少job在运行中。
(3)目前的kube-state- metrics收集的指标数据可参见官方的文档,https://github.com/kubernetes/kube-state-metrics/tree/master/docs 。
(4)kube-state-metrics并不会存储这些指标数据,所以需要使用Prometheus来抓取这些数据然后存储。
(5)使用时注意kube-state-metrics版本和k8s集群版本的兼容性,使用前先去官网看看。
7.2 部署kube-state-metrics
7.2.1 编辑yaml
[root@k8s-master1 monitor]# mkdir k8s
[root@k8s-master1 monitor]# cd k8s
[root@k8s-master1 k8s]# cat kube-state-metrics
apiVersion: apps/v1
kind: Deployment
metadata:
name: kube-state-metrics
namespace: kube-system
spec:
replicas: 1
selector:
matchLabels:
app: kube-state-metrics
template:
metadata:
labels:
app: kube-state-metrics
spec:
serviceAccountName: kube-state-metrics
containers:
- name: kube-state-metrics
image: registry.cn-hangzhou.aliyuncs.com/zhangshijie/kube-state-metrics:v2.6.0
ports:
- containerPort: 8080
---
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: kube-state-metrics
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: kube-state-metrics
rules:
- apiGroups: [""]
resources: ["nodes", "pods", "services", "resourcequotas", "replicationcontrollers", "limitranges", "persistentvolumeclaims", "persistentvolumes", "namespaces", "endpoints"]
verbs: ["list", "watch"]
- apiGroups: ["extensions"]
resources: ["daemonsets", "deployments", "replicasets"]
verbs: ["list", "watch"]
- apiGroups: ["apps"]
resources: ["statefulsets"]
verbs: ["list", "watch"]
- apiGroups: ["batch"]
resources: ["cronjobs", "jobs"]
verbs: ["list", "watch"]
- apiGroups: ["autoscaling"]
resources: ["horizontalpodautoscalers"]
verbs: ["list", "watch"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: kube-state-metrics
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: kube-state-metrics
subjects:
- kind: ServiceAccount
name: kube-state-metrics
namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
annotations:
prometheus.io/scrape: 'true'
name: kube-state-metrics
namespace: kube-system
labels:
app: kube-state-metrics
spec:
ports:
- name: kube-state-metrics
port: 80
targetPort: 8080
protocol: TCP
selector:
app: kube-state-metrics
7.2.2 应用配置
[root@k8s-master1 k8s]# kubectl apply -f kube-state-metrics
deployment.apps/kube-state-metrics created
serviceaccount/kube-state-metrics created
clusterrole.rbac.authorization.k8s.io/kube-state-metrics created
clusterrolebinding.rbac.authorization.k8s.io/kube-state-metrics created
service/kube-state-metrics created
[root@k8s-master1 k8s]# kubectl get po -A |grep kube-state-metrics
kube-system kube-state-metrics-549cf88658-lk26b 1/1 Running 0 20s
7.3 配置Prometheus抓取指标
7.3.1 编辑yaml并应用
[root@k8s-master1 prometheus]# cat cm.yaml
……省略部分内容
- job_name: 'kube-state-metrics' # 添加一个静态配置
static_configs:
- targets: ['kube-state-metrics.kube-system:80']
[root@k8s-master1 prometheus]# kubectl apply -f cm.yaml
[root@k8s-master1 prometheus]# kubectl get po -A |grep prometheus
monitoring prometheus-server-65688779d8-lr7gm 1/1 Running 0 3d
[root@k8s-master1 prometheus]# kubectl delete po -n monitoring prometheus-server-65688779d8-lr7gm
pod "prometheus-server-65688779d8-lr7gm" deleted
[root@k8s-master1 prometheus]# kubectl get po -A |grep prometheus
monitoring prometheus-server-65688779d8-cppjx 1/1 Running 0 40s
7.3.2 web页面查看
7.4 导入相关Grafana模板展示数据
8. k8s中部署Grafana
8.1 编辑yaml
[root@k8s-master1 monitor]# mkdir grafana
[root@k8s-master1 monitor]# cd grafana
[root@k8s-master1 grafana]# cat deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: grafana
name: grafana
namespace: monitoring
spec:
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
spec:
securityContext:
fsGroup: 472
supplementalGroups:
- 0
containers:
- name: grafana
image: registry.cn-hangzhou.aliyuncs.com/zhangshijie/grafana:9.3.6
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
name: http-grafana
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /robots.txt
port: 3000
scheme: HTTP
initialDelaySeconds: 10
periodSeconds: 30
successThreshold: 1
timeoutSeconds: 2
livenessProbe:
failureThreshold: 3
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
tcpSocket:
port: 3000
timeoutSeconds: 1
resources:
requests:
cpu: 250m
memory: 750Mi
volumeMounts:
- mountPath: /var/lib/grafana
name: grafana-nfs-volume
volumes:
- name: grafana-nfs-volume
nfs:
server: 10.31.200.100
path: /data/k8s_data/grafana
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: monitoring
spec:
type: NodePort
ports:
- port: 3000
protocol: TCP
targetPort: http-grafana
selector:
app: grafana
8.2 应用配置
[root@k8s-master1 prometheus]# mkdir /data/k8s_data/grafana
[root@k8s-master1 prometheus]# chmod 757 /data/k8s_data/grafana
[root@k8s-master1 grafana]# kubectl apply -f deploy.yaml
deployment.apps/grafana unchanged
service/grafana created
[root@k8s-master1 grafana]# kubectl get po,svc -A |grep grafana
monitoring pod/grafana-788fb854f6-g2wx4 1/1 Running 0 47s
monitoring service/grafana NodePort 10.200.37.34 <none> 3000:31604/TCP 47s
8.3 配置grafana模板
模板地址:https://grafana.com/grafana/dashboards/2949-nginx-vts-stats/
更多推荐
所有评论(0)