本文基于kubernetes 1.5.2版本编写

node经常会遇到以下问题:

硬件问题: cpu 内存 磁盘
内核问题: 内核死锁, 文件系统损坏
容器问题: 守护进程无响应

K8S集群管理对node的健康状态是无法感知的,pod依旧会调度到有问题的node上,通过DaemonSet部署node-problem-detector,向apiserver上报node的状态信息,使node的健康状态对上游管理可见,pod不会再调度到有异常的node上。

cat << EOF > node-problem-detector.yaml 
apiVersion: extensions/v1beta1
kind: DaemonSet
metadata:
  name: node-problem-detector-v0.4.1
  namespace: kube-system
  labels:
    k8s-app: node-problem-detector
    version: v0.4.1
    kubernetes.io/cluster-service: "true"
spec:
  template:
    metadata:
      labels:
        k8s-app: node-problem-detector
        version: v0.4.1
        kubernetes.io/cluster-service: "true"
    spec:
      hostNetwork: true
      containers:
      - name: node-problem-detector
        image: docker.io/googlecontainer/node-problem-detector:v0.4.1
        securityContext:
          privileged: false
        resources:
          limits:
            cpu: "200m"
            memory: "100Mi"
          requests:
            cpu: "20m"
            memory: "20Mi"
        volumeMounts:
        - name: log
          mountPath: /log
          readOnly: true
      volumes:
      - name: log
        hostPath:
          path: /var/log/
kubectl create -f node-problem-detector.yaml 

Logo

开源、云原生的融合云平台

更多推荐