2022学习0616【K8S coredns日志报错】
K8S coredns报错排障记录
·
之前搭建K8S master+worker节点时一直没在意这个问题,
两个coredns日志一直报错,不过派到worker上的app都运行正常,收扩容也正常,感觉通信没问题,就没管。
kubectl logs coredns-66bff467f8-wmzp5 -n=kube-system
CoreDNS-1.6.7
linux/amd64, go1.13.6, da7f65b
[ERROR] plugin/errors: 2 4233189524581335928.4523713224971580094. HINFO: read udp 10.32.0.3:44742->183.60.83.19:53: i/o timeout
[ERROR] plugin/errors: 2 4233189524581335928.4523713224971580094. HINFO: read udp 10.32.0.3:43227->183.60.83.19:53: i/o timeout
[ERROR] plugin/errors: 2 4233189524581335928.4523713224971580094. HINFO: read udp 10.32.0.3:59043->183.60.83.19:53: i/o timeout
[ERROR] plugin/errors: 2 4233189524581335928.4523713224971580094. HINFO: read udp 10.32.0.3:55516->183.60.83.19:53: i/o timeout
[ERROR] plugin/errors: 2 4233189524581335928.4523713224971580094. HINFO: read udp 10.32.0.3:60978->183.60.83.19:53: i/o timeout
[ERROR] plugin/errors: 2 4233189524581335928.4523713224971580094. HINFO: read udp 10.32.0.3:33080->183.60.83.19:53: i/o timeout
不过最近在每个节点都部署了filebeat的daemonset,用来收集nginx的日志,发送过程中发现master发送正常,可是worker上的filebeat却发送不出去。
排查步骤:
- 查看filebeat报错如上,限定为DNS的问题。
- busybox测试下DNS,
kubectl run busyboxdig -it --image=datica/busybox-dig --restart=Never --rm sh
- 先ping 114.114.114.114,如果这个不通基本就是k8s 集群网络出问题。这里我没问题。接着dig www.baidu.com 不通,dig @114.114.114.114 www.baidu.com 通,判断是coredns的问题。
- worker节点nslookup所有域名都报解析错误,master节点则没有。
- 到这里我突然想起来也就是上面的COREDNS pod好有问题,想通过exec /bin/sh 或者 /bin/bash 进去查看一番,发现进不去,但是可以通过configmap的形式编辑。于是
kubectl edit cm coredns -n kube-system -o yaml
, 我把forward . /etc/resove.conf换成了更可靠的域名服务器8.8.8.8,改完之后重启了delete了两个coredns的pod,等重新起来后没有报错了,好用了
root@master1:/etc/kubernetes/manifests# kubectl get cm coredns -n kube-system -o yaml
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . 8.8.8.8
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2022-05-16T10:46:25Z"
managedFields:
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data: {}
manager: kubeadm
operation: Update
time: "2022-05-16T10:46:25Z"
- apiVersion: v1
fieldsType: FieldsV1
fieldsV1:
f:data:
f:Corefile: {}
manager: kubectl
operation: Update
time: "2022-06-16T06:55:05Z"
name: coredns
namespace: kube-system
resourceVersion: "6459223"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: daeb9102-f5bb-4fd1-xxxxxxxxxx
后续问题:这个不清除coredns自动重启后会不会configmap重置,这样的话我还需要加个监听,不应该这么麻烦的。
更多推荐
已为社区贡献6条内容
所有评论(0)