记一次CoreDNS故障

1、问题现象

K8S环境出现CoreDNS Pod无法正常启动,处于CrashLoopBackOff或者OOMKilled状态
环境信息

  • 操作系统: Ubuntu 18.04 LTS
  • K8S版本: 1.11.2
  • 部署方式: kubeadm

2、根本原因

操作系统未配置DNS服务器,使用默认的127.0.0.53,这个会导致CoreDNS陷入死循环,最终导致OOM

3、解决办法

原理: 不能让CoreDNS的上游DNS服务器使用127.0.0.53,因为会导致死循环
方法: 给服务器配置DNS服务器(不能包含127.0.0.53

4、问题定位相关知识点

  • 给Ubuntu 18.04系统配置DNS
[root]# cat /etc/netplan/50-cloud-init.yaml
network:
    ethernets:
        eno1: # 网卡名
            addresses: [192.168.12.92/24] # 配置当前服务器IP
            gateway4: 192.168.12.1        # 服务器网关
            nameservers:                  # 如果不配置DNS,使用的就是默认的127.0.0.53
              addresses: [DNS服务器IP]
    version: 2
[root]# netplan apply # 让配置生效
  • 查看Ubuntu 18.04 LTS系统的DNS管理服务: systemd-resolved.service

当修改了 /etc/netplay/50-cloud-init.yaml的DNS配置后,netplan apply后,会自动重启systemd-resolved.service
该服务会将DNS服务的IP写在/run/systemd/resolve/resolv.conf文件中。

另外: /etc/resolv.conf 文件是/run/systemd/resolve/stub-resolv.conf的链接文件

root@intellif-0:/run/systemd/resolve# pwd
/run/systemd/resolve
root@intellif-0:/run/systemd/resolve# ll
-rw-r--r--  1 systemd-resolve systemd-resolve 591 Dec 15 15:23 resolv.conf
-rw-r--r--  1 systemd-resolve systemd-resolve 701 Dec 15 15:23 stub-resolv.conf
root@intellif-0:/run/systemd/resolve# 
[root@intellif-0 resolve]# cat resolv.conf 
# No DNS servers known.
root@intellif-0:/run/systemd/resolve# cat stub-resolv.conf 
nameserver 127.0.0.53
root@intellif-0:/run/systemd/resolve# ll /etc/resolv.conf 
lrwxrwxrwx 1 root root 39 Apr 27  2018 /etc/resolv.conf -> ../run/systemd/resolve/stub-resolv.conf

5、其他解决方案

如果K8S集群内服务不需要通过宿主机DNS服务集群外服务,可以通过修改CoreDNS的配置文件或者修改kubelet启动参数。

  • 修改CoreDNS的配置文件: kubeadm将CoreDNS的配置文件放在Configmap中
[root@intellif-0 resolve]# kubectl get cm -n kube-system coredns -o yaml
apiVersion: v1
data:
  Corefile: |
    .:53 {
        errors
        health
        kubernetes cluster.local in-addr.arpa ip6.arpa {
           pods insecure
           upstream
           fallthrough in-addr.arpa ip6.arpa
        }
        prometheus :9153
        proxy . /etc/resolv.conf # 去掉该行
        cache 30
        reload
    }
kind: ConfigMap
metadata:
  creationTimestamp: 2018-12-04T08:54:58Z
  name: coredns
  namespace: kube-system
  resourceVersion: "218"
  selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
  uid: 4726def5-f7a2-11e8-909a-6c92bf527086
[root@intellif-0 resolve]#
  • 修改kubelet启动参数
[root@intellif-0 resolve]# kubelet --help 2>&1 | grep resolv
      --resolv-conf string    Resolver configuration file used as the basis for the container DNS resolution configuration. (default "/etc/resolv.conf") (DEPRECATED: This parameter should be set via the config file specified by the Kubelet's --config flag. See https://kubernetes.io/docs/tasks/administer-cluster/kubelet-config-file/ for more information.)

6、参考

https://github.com/coredns/coredns/issues/1647

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐