kubernetes搭建prometheus(三)监控node节点
部署node监控组件node-exporter-ds.ymlapiVersion: apps/v1kind: DaemonSetmetadata:name: node-exporternamespace: kube-systemlabels:k8s-app: node-exporterkubernetes.io/cluster-service: "true"addonmanager.kuberne
说明
NodeExporter用于对宿主机进行监控,通常使用DaemonSet方式部署到k8s上,在每个节点上都必须存在一个pod,但是因为容器隔离机制,无法对宿主机的其他磁盘进行监控,只能监控 '/' 磁盘,如果必须要监控其他磁盘,可以查询直接部署的宿主机的文档,本文依旧使用容器方式部署。
部署node监控组件
node-exporter-ds.yml
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: node-exporter
namespace: kube-system
labels:
k8s-app: node-exporter
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
version: v1.2.2
spec:
selector:
matchLabels:
k8s-app: node-exporter
version: v1.2.2
updateStrategy:
type: OnDelete
template:
metadata:
labels:
k8s-app: node-exporter
version: v1.2.2
spec:
priorityClassName: system-node-critical
containers:
- name: prometheus-node-exporter
image: "prom/node-exporter:v1.2.2"
imagePullPolicy: "IfNotPresent"
args:
- --path.procfs=/host/proc
- --path.sysfs=/host/sys
ports:
- name: metrics
containerPort: 9100
hostPort: 9100
volumeMounts:
- name: proc
mountPath: /host/proc
readOnly: true
- name: sys
mountPath: /host/sys
readOnly: true
resources:
limits:
memory: 50Mi
requests:
cpu: 100m
memory: 50Mi
hostNetwork: true
hostPID: true
volumes:
- name: proc
hostPath:
path: /proc
- name: sys
hostPath:
path: /sys
node这里我修改了镜像版本,由文件中的0.15.2改为了v1.2.2,老版本很多查询语句是不支持的。
端口暴露
node-exporter-service.yaml
apiVersion: v1
kind: Service
metadata:
name: node-exporter
namespace: kube-system
annotations:
prometheus.io/scrape: "true"
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: Reconcile
kubernetes.io/name: "NodeExporter"
spec:
clusterIP: None
ports:
- name: metrics
port: 9100
protocol: TCP
targetPort: 9100
selector:
k8s-app: node-exporter
此时用natstat可以看到两台node节点上各有一个9100端口
修改prometheus配置
# Prometheus configuration format https://prometheus.io/docs/prometheus/latest/configuration/configuration/
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: kube-system
labels:
kubernetes.io/cluster-service: "true"
addonmanager.kubernetes.io/mode: EnsureExists
data:
prometheus.yml: |
scrape_configs:
- job_name: prometheus
static_configs:
- targets:
- localhost:9090
prometheus.yml: |
配置采集目标
scrape_configs:
- job_name: kubernetes-nodes
static_configs:
- targets:
- 172.16.60.9:9100
- 172.16.60.10:9100
- job_name: kubernetes-apiservers
kubernetes_sd_configs:
增加了如下配置:
prometheus.yml: |
配置采集目标
scrape_configs:
- job_name: kubernetes-nodes
static_configs:
- targets:
- 172.16.60.9:9100
- 172.16.60.10:9100
加载修改后的configmap
# kubectl apply -f prometheus-configmap.yaml
prometheus会自动加载配置文件,如果长时间未加载可以重新部署一下
重新部署prometheus-server
# kubectl rollout restart statefulset/prometheus -n kube-system
多半会失败,需要删除prometheus-server中挂载的目录中的文件,我这里是下面的样子,作为参考:
# ll /uat-pod-log/prometheus/
total 5
-rw------- 1 nfsnobody nfsnobody 2 Oct 20 20:46 lock
drwxr-xr-x 2 nfsnobody nfsnobody 4096 Oct 20 20:47 wal
查看prometheus ui页面,会出现刚刚添加的node监控
grafana展示
grafana.com/grafana/dashboards/9276
1.主机基础监控(cpu,内存,磁盘,网络) dashboard for Grafana | Grafana Labs
图中有些地方需要修改一下,可以配合图中的查询语句去prometheus页面去查一下,查不出来就去少点语句看看能不能查出来定位一下问题。
下一篇:资源监控
更多推荐
所有评论(0)