k8s常见问题
(1)network: failed to set bridge addr: “cni0” already has an IP address different fromsudo ip link delete cni0#之后会重建
·
(1)network: failed to set bridge addr: “cni0” already has an IP address different from
sudo ip link delete cni0
#之后会重建
(2)设置token过期时间
修改 recommended.yaml
(3)k8s初始化流程
# 未指定 service-cidr,使用默认:10.96.0.0/12
(1)kubeadm init --kubernetes-version=v1.17.4 --pod-network-cidr=2.244.0.0/16 --ignore-preflight-errors=all --apiserver-advertise-address=10.0.0.1 --v=10 --image-repository="registry.aliyuncs.com/google_containers"
k8s网络插件地址
重点关注如下参数,其中Network的值设置需同下图网段一致;
可能会发现cni0与flannel.1网段不一致,此时仅需删除cni0
ip link delete cni0,自动重新创建一个即可;
# 我这里使用了kube-flannel.yaml
wget https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml
#修改每一个节点的InternalIP
vim /var/lib/kubelet/kubeadm-flags.env
添加 --node-ip=x.x.x.x(当前节点的ip)
(4)在Pod内无法使用nslookup kubernetes,以及dns解析相关一些列问题
是因为master节点初始化时未设置
/etc/resolv.conf的namserver为 kube-dns service对应的ip,因此导致后续无法识别Pod的DNS以及衍生的一系列问题
其中 nameserver 10.96.0.10对应kube-dns的service IP
这样每次我们创建一个Pod时会将该DNS注册至相应IP对应的DNS服务中;
(7)某个Pod内无法进行域名解析排查流程
(1) 应用下文提到的dnsutils.yml工具
# 拿到kube-dns的clusterIP
k get svc -n kube-system
# 进入dnsutils pod
# 若能ping通说明kube-dns这里的Service没问题
ping 10.96.0.10
# dns解析流程 pod内的/etc/resolv.conf -> kube-dns(10.96.0.10) -> coredns -> coredns forward指定的规则(如下图)
# 检查能否直接ping同coredns pod
# 因为 10.96.0.10会将解析转发至dnsPod
# 在主节点上查看
k get pod -n kube-system |grep coredns
# 拿到2个coredns pod的ping
# 在dnsutils中ping这2个ip
ping 2.244.0.22
ping 2.244.0.23
# 若能ping通说明没问题
# 若不能ping通,该节点有问题
# 在主节点拿到所有pod的ip
k get pod -o wide --all-namespace
# 分别对每个节点的一个Pod ping
# 发现唯独无法ping通该节点的pod的ip,但其他节点的任意pod的IP的都可以
# 说明该Pod所在节点的网络有问题
# 查看该节点对应的 kube-flannel-dn-xxx 日志
# 发现有抛错,见下图 network is down
# 解决方案
# 重新安装K8s网络插件 kube-flannel
# 删除该有问题的网络插件
ip link delete flannel.1
k delete -f kube-flannel.yml
k apply -f kube-flannel.yml
# nslookup baidu.com
直接修改configmap中coredns的forward
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . 114.114.114.114
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2022-01-12T13:30:42Z"
name: coredns
namespace: kube-system
resourceVersion: "175"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: dc86f723-446d-40dc-a8fa-c0759d6436b4
~ ~
# 验证K8s的dns是否正常
kubectl get svc -n kube-system
# 其中kube-dns就是K8s的namserver,如下图
nslookup baidu.com 10.96.0.10
# dns查询工具: dnsutils
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
spec:
containers:
- name: dnsutils
image: nettlefish/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
# busybox
apiVersion: apps/v1
kind: Deployment
metadata:
name: app
spec:
replicas: 1
selector:
matchLabels:
app: busybox
template:
metadata:
labels:
app: busybox
spec:
containers:
- name: app
image: busybox #内置的linux大多数命令,多用于测试
args:
- /bin/sh
- -c
- sleep 10; touch /tmp/healthy; sleep 30000
readinessProbe: #就绪探针
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 10 #10s之后开始第一次探测
periodSeconds: 5 #第一次探测之后每隔5s探测一次
(5)部署zookeeper集群时有一个节点的zk的myid与预期不一致
可能是因为挂载的pv中的旧数据影响,需要先删除pv中的数据,在运行k8s-zk集群
#挂上我的镜像
apiVersion: v1
kind: Service
metadata:
name: zk-hs
labels:
app: zk
spec:
ports:
- port: 2888
name: server
- port: 3888
name: leader-election
clusterIP: None
selector:
app: zk
---
apiVersion: v1
kind: Service
metadata:
name: zk-cs
labels:
app: zk
spec:
type: NodePort
ports:
- port: 2181
targetPort: 2181
nodePort: 30012
name: client
selector:
app: zk
---
apiVersion: policy/v1beta1
kind: PodDisruptionBudget
metadata:
name: zk-pdb
spec:
selector:
matchLabels:
app: zk
maxUnavailable: 1
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: zk
spec:
selector:
matchLabels:
app: zk
serviceName: zk-hs
replicas: 3
updateStrategy:
type: RollingUpdate
podManagementPolicy: OrderedReady
template:
metadata:
labels:
app: zk
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: "app"
operator: In
values:
- zk
topologyKey: "kubernetes.io/hostname"
containers:
- name: kubernetes-zookeeper
imagePullPolicy: IfNotPresent
# image: leolee32/kubernetes-library:kubernetes-zookeeper1.0-3.4.10
image: "yuanxi2314/kubernetes-zookeeper1.0-3.4.10:v8.0"
resources:
requests:
memory: "1Gi"
cpu: "0.5"
ports:
- containerPort: 2181
name: client
- containerPort: 2888
name: server
- containerPort: 3888
name: leader-election
command:
- sh
- -c
- "start-zookeeper \
--servers=3 \
--data_dir=/var/lib/zookeeper/data \
--data_log_dir=/var/lib/zookeeper/data/log \
--conf_dir=/opt/zookeeper/conf \
--client_port=2181 \
--election_port=3888 \
--server_port=2888 \
--tick_time=2000 \
--init_limit=10 \
--sync_limit=5 \
--heap=512M \
--max_client_cnxns=60 \
--snap_retain_count=3 \
--purge_interval=12 \
--max_session_timeout=40000 \
--min_session_timeout=4000 \
--log_level=INFO"
readinessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command:
- sh
- -c
- "zookeeper-ready 2181"
initialDelaySeconds: 10
timeoutSeconds: 5
volumeMounts:
- name: datadir
mountPath: /var/lib/zookeeper
# securityContext:
# runAsUser: 0
# fsGroup: 0
volumeClaimTemplates:
- metadata:
name: datadir
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 10Gi
(5)k8s - DNS相关文献
故障排查:Kubernetes 中 Pod 无法正常解析域名
(6)k8s主节点无法访问其他节点暴露出来的端口服务
# 修改docker iptables规则,如下图所示
vi /usr/lib/systemd/system/docker.service
systemctl daemon-reload
systemctl restart docker
systemctl status docker
(7)cni config uninitialized KubeletNotReady runtime network not ready:
通过观察,发现 /etc/cni/net.d/10-flannel.conflist 并未缺失,因此猜测是使用snap install kubelet时缺少了一些插件,我们手动安装即可;
sudo mkdir -pv /opt/cni/bin
cd /opt/cni/bin
# https://github.com/containernetworking/plugins/releases/tag/v0.8.6
# 找到相应版本的插件手动安装
wget https://github.com/containernetworking/plugins/releases/download/v0.8.6/cni-plugins-linux-amd64-v0.8.6.tgz
tar -zxvf cni-plugins-linux-amd64-v0.8.6.tgz
kubectl get node
更多推荐
已为社区贡献16条内容
所有评论(0)