只有一个问题,原来的httpGet存活、就绪检测一直不通过,于是改为tcpSocket后pod正常。

wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

修改后的yaml文件,镜像修改为阿里云

apiVersion: v1
kind: ServiceAccount
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
    rbac.authorization.k8s.io/aggregate-to-admin: "true"
    rbac.authorization.k8s.io/aggregate-to-edit: "true"
    rbac.authorization.k8s.io/aggregate-to-view: "true"
  name: system:aggregated-metrics-reader
rules:
- apiGroups:
  - metrics.k8s.io
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  verbs:
  - get
- apiGroups:
  - ""
  resources:
  - pods
  - nodes
  verbs:
  - get
  - list
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server-auth-reader
  namespace: kube-system
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: Role
  name: extension-apiserver-authentication-reader
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server:system:auth-delegator
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:auth-delegator
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  labels:
    k8s-app: metrics-server
  name: system:metrics-server
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: system:metrics-server
subjects:
- kind: ServiceAccount
  name: metrics-server
  namespace: kube-system
---
apiVersion: v1
kind: Service
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  ports:
  - name: https
    port: 443
    protocol: TCP
    targetPort: https
  selector:
    k8s-app: metrics-server
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    k8s-app: metrics-server
  name: metrics-server
  namespace: kube-system
spec:
  selector:
    matchLabels:
      k8s-app: metrics-server
  strategy:
    rollingUpdate:
      maxUnavailable: 0
  template:
    metadata:
      labels:
        k8s-app: metrics-server
    spec:
      containers:
      - args:
#        - --cert-dir=/tmp
#        - --secure-port=4443
#        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
#        - --kubelet-use-node-status-port
#        - --metric-resolution=15s
        - --cert-dir=/tmp
        - --secure-port=4443
        - --metric-resolution=30s
        - --kubelet-insecure-tls
        - --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
        - --requestheader-username-headers=X-Remote-User
        - --requestheader-group-headers=X-Remote-Group
        - --requestheader-extra-headers-prefix=X-Remote-Extra-
        image: registry.aliyuncs.com/google_containers/metrics-server:v0.6.4
        imagePullPolicy: IfNotPresent
        livenessProbe:
          failureThreshold: 3
          tcpSocket:
            port: 4443
          periodSeconds: 10
        name: metrics-server
        ports:
        - containerPort: 4443
          name: https
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          tcpSocket:
            port: 4443
          initialDelaySeconds: 20
          periodSeconds: 10
        resources:
          requests:
            cpu: 100m
            memory: 200Mi
        securityContext:
          allowPrivilegeEscalation: false
          readOnlyRootFilesystem: true
          runAsNonRoot: true
          runAsUser: 1000
        volumeMounts:
        - mountPath: /tmp
          name: tmp-dir
      nodeSelector:
        kubernetes.io/os: linux
      priorityClassName: system-cluster-critical
      serviceAccountName: metrics-server
      volumes:
      - emptyDir: {}
        name: tmp-dir
---
apiVersion: apiregistration.k8s.io/v1
kind: APIService
metadata:
  labels:
    k8s-app: metrics-server
  name: v1beta1.metrics.k8s.io
spec:
  group: metrics.k8s.io
  groupPriorityMinimum: 100
  insecureSkipTLSVerify: true
  service:
    name: metrics-server
    namespace: kube-system
  version: v1beta1
  versionPriority: 100
kubectl apply -f components.yaml

在这里插入图片描述

在这里插入图片描述
部署kube-prometheus
兼容1.27的为main分支

在这里插入图片描述
只克隆main分支

git clone  --single-branch --branch main https://github.com/prometheus-operator/kube-prometheus.git
cd kube-prometheus

如果直接运行kubectl apply -f manifests/setup/,会有元数据注解太长的错误
Error from server (Invalid): error when creating "manifests/setup/0prometheusCustomResourceDefinition.yaml": CustomResourceDefinition.apiextensions.k8s.io "prometheuses.monitoring.coreos.com" is invalid: metadata.annotations: Too long: must have at most 262144 bytes

应该使用以下方式
1. kubectl apply -f manifests/setup/
2. kubectl apply --server-side -f manifests/setup 

CRD正常创建后然后再执行:
kubectl apply -f manifests

镜像下载使用这个项目的方法:https://github.com/DaoCloud/public-image-mirror,在镜像前面加:m.daocloud.io

prometheus-k8s-0 pod报权限不足问题

ts=2023-08-21T03:14:34.582Z caller=klog.go:116 level=error component=k8s_client_runtime func=ErrorDepth msg="pkg/mod/k8s.io/client-go@v0.27.3/tools/cache/reflector.go:231: Failed to watch *v1.Pod: failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list resource \"pods\" in API group \"\" at the cluster scope"

处理:
修改prometheus-clusterRole.yaml

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.26.0
  name: prometheus-k8s
rules:
- apiGroups:
  - ""
  resources:
  - nodes/metrics
  - services
  - endpoints
  - pods
  verbs:
  - get
  - list
  - watch
- nonResourceURLs:
  - /metrics
  verbs:
  - get

跨节点pind不通monitoring下的pod ip,prometheus target异常问题处理,

删除kube-prometheus/manifests下的netowrkprolic
alertmanager-networkPolicy.yaml
blackboxExporter-networkPolicy.yaml
grafana-networkPolicy.yaml
kubeStateMetrics-networkPolicy.yaml
nodeExporter-networkPolicy.yaml
prometheusAdapter-networkPolicy.yaml
prometheus-networkPolicy.yaml
prometheusOperator-networkPolicy.yaml

使用ServiceMonitor添加监控:
以ingress-nginx为例
修改ingress-nginx.yaml的service

apiVersion: v1
kind: Service
metadata:
  annotations:
    prometheus.io/port: '10254' #新增内容
    prometheus.io/scrape: 'true' #新增内容
  labels:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
    app.kubernetes.io/part-of: ingress-nginx
    app.kubernetes.io/version: 1.8.1
  name: ingress-nginx-controller
  namespace: ingress-nginx
spec:
  externalTrafficPolicy: Local
  ipFamilies:
  - IPv4
  ipFamilyPolicy: SingleStack
  ports:
  - appProtocol: http
    name: http
    port: 80
    protocol: TCP
    targetPort: http
  - appProtocol: https
    name: https
    port: 443
    protocol: TCP
    targetPort: https
  - appProtocol: http  #新增内容
    name: prometheus  #新增内容
    port: 10254  #新增内容
    protocol: TCP  #新增内容
    targetPort: 10254  #新增内容
  selector:
    app.kubernetes.io/component: controller
    app.kubernetes.io/instance: ingress-nginx
    app.kubernetes.io/name: ingress-nginx
  type: LoadBalancer
  

# ingress-nginx-monitor.yaml

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: ingress-nginx  # ServiceMonitor名称
  namespace: monitoring # ServiceMonitor所在名称空间
spec:
  endpoints:  # prometheus所采集Metrics地址配置,endpoints为一个数组,可以创建多个,但是每个endpoints包含三个字段interval、path、port
    - interval: 15s # prometheus采集数据的周期,单位为秒
      path: /metrics # prometheus采集数据的路径
      port: prometheus # prometheus采集数据的端口,这里为port的name,主要是通过spec.selector中选择对应的svc,在选中的svc中匹配该端口
  namespaceSelector: # 需要发现svc的范围
    any: true     # 有且仅有一个值true,当该字段被设置时,表示监听所有符合selector所选择的svc
  selector:
    matchLabels:  # 选择svc的标签
      app.kubernetes.io/component: controller
      app.kubernetes.io/instance: ingress-nginx
      app.kubernetes.io/name: ingress-nginx
      app.kubernetes.io/part-of: ingress-nginx
      app.kubernetes.io/version: 1.8.1
      
kubectl apply -f ingress-nginx-monitor.yaml
确认ingress-nginx servicemonitors已添加
kubectl -n monitoring get servicemonitors.monitoring.coreos.com 
NAME                      AGE
alertmanager-main         70m
blackbox-exporter         70m
coredns                   70m
grafana                   70m
ingress-nginx             24m
kube-apiserver            70m
kube-controller-manager   70m
kube-scheduler            70m
kube-state-metrics        70m
kubelet                   70m
node-exporter             70m
prometheus-adapter        70m
prometheus-k8s            70m
prometheus-operator       70m


在这里插入图片描述
监控外部ETCD:
etcd配置文件添加:ETCD_LISTEN_METRICS_URLS=“http://0.0.0.0:2381”
etcd-job.yaml

- job_name: "etcd-server"
  scrape_interval: 30s
  scrape_timeout: 10s
  static_configs:
    - targets: ["172.16.0.157:2381"]
      labels:
        instance: etcdserver

创建secret

kubectl create secret generic etcd-secret --from-file=etcd-job.yaml -n monitoring

kube-prometheus/manifests/prometheus-prometheus.yaml 末尾添加配置:

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  labels:
    app.kubernetes.io/component: prometheus
    app.kubernetes.io/instance: k8s
    app.kubernetes.io/name: prometheus
    app.kubernetes.io/part-of: kube-prometheus
    app.kubernetes.io/version: 2.46.0
  name: k8s
  namespace: monitoring
spec:
  alerting:
    alertmanagers:
    - apiVersion: v2
      name: alertmanager-main
      namespace: monitoring
      port: web
  enableFeatures: []
  externalLabels: {}
  image: quay.io/prometheus/prometheus:v2.46.0
  nodeSelector:
    kubernetes.io/os: linux
  podMetadata:
    labels:
      app.kubernetes.io/component: prometheus
      app.kubernetes.io/instance: k8s
      app.kubernetes.io/name: prometheus
      app.kubernetes.io/part-of: kube-prometheus
      app.kubernetes.io/version: 2.46.0
  podMonitorNamespaceSelector: {}
  podMonitorSelector: {}
  probeNamespaceSelector: {}
  probeSelector: {}
  replicas: 2
  resources:
    requests:
      memory: 400Mi
  ruleNamespaceSelector: {}
  ruleSelector: {}
  securityContext:
    fsGroup: 2000
    runAsNonRoot: true
    runAsUser: 1000
  serviceAccountName: prometheus-k8s
  serviceMonitorNamespaceSelector: {}
  serviceMonitorSelector: {}
  version: 2.46.0
  additionalScrapeConfigs:  #新增
    name: etcd-secret  #新增
    key: etcd-job.yaml  #新增
kubectl apply -f kube-prometheus/manifests/prometheus-prometheus.yaml

grafana导入 3070模板
在这里插入图片描述
http存活检测

kind: Probe
apiVersion: monitoring.coreos.com/v1
metadata:
  name: example-com-website
  namespace: monitoring
spec:
  interval: 60s
  module: http_2xx
  prober:
    url: blackbox-exporter.monitoring.svc.cluster.local:19115
  targets:
    staticConfig:
      static:
      - https://www.baidu.com

在这里插入图片描述
在这里插入图片描述
添加proxy监控,默认为http

---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app.kubernetes.io/name: kube-proxy
    app.kubernetes.io/part-of: kube-prometheus
  name: kube-proxy
  namespace: monitoring
spec:
  endpoints:
  - bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    interval: 30s
    port: http-metrics
    scheme: http
    tlsConfig:
      insecureSkipVerify: true
  jobLabel: app.kubernetes.io/name
  namespaceSelector:
    matchNames:
    - kube-system
  selector:
    matchLabels:
      app.kubernetes.io/name: kube-proxy
---
apiVersion: v1
kind: Endpoints
metadata:
  name: kube-proxy
  namespace: kube-system
  labels:
    app.kubernetes.io/name: kube-proxy
subsets:
- addresses:
  - ip: 172.16.0.157
    targetRef:
      kind: Node
      name: master1
  - ip: 172.16.0.124
    targetRef:
      kind: Node
      name: node1
  - ip: 172.16.0.46
    targetRef:
      kind: Node
      name: node2
  ports:
  - name: http-metrics
    port: 10249
    protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
  name: kube-proxy
  namespace: kube-system
  labels:
    app.kubernetes.io/name: kube-proxy
spec:
  type: ClusterIP 
  clusterIP: None
  ports:
  - name: http-metrics
    port: 10249
    targetPort: 10249
    protocol: TCP

在这里插入图片描述

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐