kubeadm快速搭建k8s集群(单master节点)

一、集群部署前规划

主机操作系统IPdocker版本k8s版本
k8s-master1Centos7.9192.168.15.13920.10.121.23.4-0
k8s-node1Centos7.9192.168.8.13520.10.121.23.4-0
k8s-node2Centos7.9192.168.8.13620.10.121.23.4-0

二、主要步骤

  1. 节点准备工作(所有节点)
  2. 部署容器运行时docker(所有节点)
  3. 部署kubeadm,kubelet,kubectl这三个服务(所有节点)
  4. 初始化master节点(master节点)
  5. node节点使用kubeadm join 加入集群(所有node节点)

三、节点准备工作

#设置主机名
hostnamectl set-hostname master1


#时间同步
yum install -y chrony
systemctl enable chronyd && systemctl restart chronyd
timedatectl set-ntp true


# 关闭防火墙
systemctl stop firewalld
systemctl disable firewalld
 
#禁用selinux,设置SELINUX=disabled
sed -i 's/enforcing/disabled/' /etc/selinux/config
setenforce 0


#禁用swap分区
swapoff -a
sed -ri 's/.*swap.*/#&/' /etc/fstab


# 确保 br_netfilter 模块被加载
lsmod | grep br_netfilter
# 若要显式加载该模块,可执行 
sudo modprobe br_netfilter
 
#允许iptables检查桥接流量
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system


#加载ipvs相关模块
cat > /etc/sysconfig/modules/ipvs.modules <<EOF
#!/bin/bash
modprobe -- ip_vs
modprobe -- ip_vs_rr
modprobe -- ip_vs_wrr
modprobe -- ip_vs_sh
modprobe -- nf_conntrack_ipv4
EOF

#继续执行脚本
chmod 755 /etc/sysconfig/modules/ipvs.modules && bash /etc/sysconfig/modules/ipvs.modules && lsmod | grep -e ip_vs -e nf_conntrack_ipv4

#管理工具ipvsadm安装
yum install ipset ipvsadm -y

四、安装docker

# 1. 如果已经安装了docker,卸载旧版本(版本过低的情况下(k8s版本和docker版本有依赖关系)
yum remove docker \
                  docker-client \
                  docker-client-latest \
                  docker-common \
                  docker-latest \
                  docker-latest-logrotate \
                  docker-logrotate \
                  docker-engine
#2. 安装docker
# 提供yum-config-manager程序,device mapper 存储驱动程序需要 device-mapper-persistent-data 和 lvm2
sudo yum install -y yum-utils  device-mapper-persistent-data lvm2
 
# 3. 设置镜像仓库
# yum-config-manager会自动生成/etc/yum.repos.d下面的yum源文件
# 使用阿里云源
 
sudo yum-config-manager --add-repo \
     https://download.docker.com/linux/centos/docker-ce.repo
 
# 4. 查看可用版本
yum list docker-ce --showduplicates | sort -r
 
# 5. 安装最新版本,或者也可以安装指定版本
yum -y install docker-ce docker-ce-cli containerd.io
yum -y install docker-ce-<VERSION_STRING> docker-ce-cli-<VERSION_STRING> containerd.io
 
# 6. 设置docker开机自启动
systemctl start docker && systemctl enable docker
 
# 7. 检查docker是否正常运行
docker version 
 
# 8. 配置docker,使用 systemd 来管理容器的 cgroup 
sudo mkdir /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{
  "exec-opts": ["native.cgroupdriver=systemd"],
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "100m"
  },
  "storage-driver": "overlay2"
}
EOF
 
# 9. 重启docker
sudo systemctl daemon-reload
sudo systemctl restart docker

五、部署kubeadm,kubelet,kubectl

# 由于官网中的地址不可访问,所以添加阿里源
cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kube*
EOF
 
 
#安装 kubelet kubeadm kubectl 
#--disableexcludes=kubernetes  禁掉除了这个之外的别的仓库
 yum install -y kubelet kubeadm kubectl --disableexcludes=kubernetes
systemctl enable kubelet && systemctl start kubelet

此时用 systemctl status kubelet查看kubelet发现未启动成功是正常的

master节点执行init操作成功或node节点加人集群后会启动

六、初始化master节点

1.生成初始化文件

kubeadm config print init-defaults > kubeadm-init.yaml

2.编kubeadm-init.yaml

将advertiseAddress: 1.2.3.4修改为本机IP地址

将imageRepository: k8s.gcr.io修改为imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers(阿里云的仓库)

修改节点名称,如果不修改就是默认的’node’

如果采用calico作为网络插件,在serviceSubnet: 10.96.0.0/12下面

添加podSubnet: 192.168.0.0/16

修改好的kubeadm-init.yaml

apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.15.139
  bindPort: 6443
nodeRegistration:
  criSocket: /var/run/dockershim.sock
  imagePullPolicy: IfNotPresent
  name: master1
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.cn-hangzhou.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.23.0
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
  podSubnet: 192.168.0.0/16
scheduler: {}

3.执行init操作

kubeadm init --config kubeadm-init.yaml

成功后保存生成的kubeadm join 192.168.15.139:6443 --token 1812pi.ejiahyyg5978c5oh --discovery-token-ca-cert-hash sha256:06feacafb8dc352f2432e9a121e440840144f1f746bdeb8173274dcb510a7e12命令,在子节点加入集群时会用到

忘记操作某个步骤导致init失败可执行以下命令重置,然后重新init

kubeadm reset

4.运行kubectl

# 添加权限
# 如果要使用root用户执行kubectl
export KUBECONFIG=/etc/kubernetes/admin.conf
 
# 要使非 root 用户可以运行 kubectl,请运行以下命令, 它们也是 kubeadm init 输出的一部分
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

此时查看节点的状态,为Not ready,因为还没有安装网络插件

4.安装网络插件

calico和flannel 二选一安装即可

安装calico

curl https://docs.projectcalico.org/manifests/calico.yaml -O

kubectl apply -f calico.yaml

安装flannel

wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

kubectl apply -f kube-flannel.yml
 

安装完后查看节点状态为 Ready

kubectl get node
NAME       STATUS   ROLES                  AGE    VERSION
master01   Ready    control-plane,master   4h4m   v1.23.3

5.kube-proxy开启ipvs

#修改ConfigMap的kube-system/kube-proxy中的config.conf,mode: “ipvs”:
kubectl edit cm kube-proxy -n kube-system

#之后重启各个节点上的kube-proxy pod:
kubectl get pod -n kube-system | grep kube-proxy | awk '{system("kubectl delete pod "$1" -n kube-system")}'

#查看日志
kubectl logs kube-proxy-2696f -n kube-system
日志中打印出了Using ipvs Proxier,说明ipvs模式已经开启。

在这里插入图片描述

七、node节点加入集群

在子节点服务器上执行kubeadm join 命令(master节点init操作生成的)

kubeadm join 192.168.15.139:6443 --token 1812pi.ejiahyyg5978c5oh --discovery-token-ca-cert-hash sha256:06feacafb8dc352f2432e9a121e440840144f1f746bdeb8173274dcb510a7e12

要是忘记保存这串命令可以执行以下命令重新获取

kubeadm token create --print-join-command

删除节点

由于这个节点上运行着服务,直接删除掉节点会导致服务不可用.我们首先使用kubectl drain命令来驱逐这个节点上的所有pod

kubectl drain nodename --delete-local-data --force --ignore-daemonsets
kubectl delete node nodename

-------------------------------------------------------(以下皆是可选安装)--------------------------------------------------------------------

八、安装Dashboard

  1. 下载yaml,并运行Dashboard
1.下载yaml
wget https://raw.githubusercontent.com/kubernetes/dashboard/v2.3.1/aio/deploy/recommended.yaml


2.修改kubernetes-dashboard的Service类型
vim recommended.yaml

kind: Service
apiVersion: v1
metadata:
  labels:
    k8s-app: kubernetes-dashboard
  name: kubernetes-dashboard
  namespace: kubernetes-dashboard
spec:
  type: NodePort  # 新增
  ports:
    - port: 443
      targetPort: 8443
      nodePort: 30009  # 新增
  selector:
    k8s-app: kubernetes-dashboard

3.部署
kubectl create -f recommended.yaml

4.查看namespace下的kubernetes-dashboard下的资源
[root@master1 kubernetes]# kubectl get pod,svc -n kubernetes-dashboard
NAME                                            READY   STATUS    RESTARTS   AGE
pod/dashboard-metrics-scraper-79459f84f-v5995   1/1     Running   0          60s
pod/kubernetes-dashboard-76dc96b85f-tqctb       1/1     Running   0          60s

NAME                                TYPE        CLUSTER-IP      EXTERNAL-IP   PORT(S)         AGE
service/dashboard-metrics-scraper   ClusterIP   10.99.123.73    <none>        8000/TCP        60s
service/kubernetes-dashboard        NodePort    10.103.73.202   <none>        443:30009/TCP   60s

若出现问题:
在这里插入图片描述
解决:

vim /etc/hosts
 
199.232.96.133   raw.githubusercontent.com
  1. 创建访问账户,获取token
# 创建账号
kubectl create serviceaccount dashboard-admin -n kubernetes-dashboard

# 授权
kubectl create clusterrolebinding dashboard-admin-rb --clusterrole=cluster-admin --serviceaccount=kubernetes-dashboard:dashboard-admin

# 获取账号token
[root@master1 kubernetes]# kubectl get secrets -n kubernetes-dashboard | grep dashboard-admin
dashboard-admin-token-hmphn        kubernetes.io/service-account-token   3      57s


[root@master1 mnt]# kubectl describe secrets dashboard-admin-token-bssq7 -n kubernetes-dashboard
Name:         dashboard-admin-token-bssq7
Namespace:    kubernetes-dashboard
Labels:       <none>
Annotations:  kubernetes.io/service-account.name: dashboard-admin
              kubernetes.io/service-account.uid: c81ce85c-1903-4fd9-97df-e66ee8cba593

Type:  kubernetes.io/service-account-token

Data
====
token:      eyJhbGciOiJSUzI1NiIsImtpZCI6ImVyV1g5dW5uR2NHVVd3ZkkzcEtST2ViOUIzbXVUSmlPcEVlYkNSZEFOd28ifQ.eyJpc3MiOiJrdWJlcm5ldGVzL3NlcnZpY2VhY2NvdW50Iiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9uYW1lc3BhY2UiOiJrdWJlcm5ldGVzLWRhc2hib2FyZCIsImt1YmVybmV0ZXMuaW8vc2VydmljZWFjY291bnQvc2VjcmV0Lm5hbWUiOiJkYXNoYm9hcmQtYWRtaW4tdG9rZW4tdGo1anMiLCJrdWJlcm5ldGVzLmlvL3NlcnZpY2VhY2NvdW50L3NlcnZpY2UtYWNjb3VudC5uYW1lIjoiZGFzaGJvYXJkLWFkbWluIiwia3ViZXJuZXRlcy5pby9zZXJ2aWNlYWNjb3VudC9zZXJ2aWNlLWFjY291bnQudWlkIjoiMGYwNTFhOWQtOWYxYy00MDdiLTgwZDYtNTVlN2EzYmZkNjM4Iiwic3ViIjoic3lzdGVtOnNlcnZpY2VhY2NvdW50Omt1YmVybmV0ZXMtZGFzaGJvYXJkOmRhc2hib2FyZC1hZG1pbiJ9.Cc-uva1RsQYmJb3bN2BTVmzUyklfzYM4qd9l5caz4XFWtplZT3kNmNELX_N9X8dg7lb-h9pOptIFA1FeuEVU5Q0mMeuV5PVQlZAUs3OUAW4A9R4HQ-f5_4UIXAGCz5hSf55ChwmOxLsSi16orFnfR96YIC-uQvY7VVP_KJB2oIhhraX-Mbzu-LzOSrSIjhhmf3HBTPud9H3GoLZUyNGrG6VNzkG6XUanF2P36aLLolq8V-7IPRezKGnjhF7W3cjPDxj0vzdwVd9IAOhMDeXkXU011GuW04YSRJ4FzMjQVASaB3GVj7c4-tSINCv3Wto9o48PVC6tsloQuoxzwsr_CQ
ca.crt:     1099 bytes
namespace:  20 bytes

  1. 通过浏览器访问Dashboard的UI

在登录页面上输入上面的token
在这里插入图片描述

出现下面的页面代表成功
在这里插入图片描述

九、安装kubectl命令自动补全功能

yum install -y bash-completion 
source /usr/share/bash-completion/bash_completion
source <(kubectl completion bash)
echo "source <(kubectl completion bash)" >> ~/.bashrc

十、安装helm

K8s 版本支持的各个 helm 版本对照表:
https://helm.sh/zh/docs/topics/version_skew/

1.安装helm客户端工具

wget https://get.helm.sh/helm-v3.8.0-linux-amd64.tar.gz
tar xf helm-v3.8.0-linux-amd64.tar.gz
mv linux-amd64/helm /usr/local/bin/helm

2.添加仓库

#添加阿里云的 chart 仓库
helm repo add aliyun https://kubernetes.oss-cn-hangzhou.aliyuncs.com/charts
#添加 bitnami 的 chart 仓库
helm repo add bitnami https://charts.bitnami.com/bitnami
#添加ingress-nginx
helm repo add ingress-nginx https://kubernetes.github.io/ingress-nginx

3.更新仓库

#更新 chart 仓库
helm repo update
#查看配置的 chart 仓库有哪些
helm repo list

4.删除仓库地址

#删除 chart 仓库地址
helm repo remove aliyun

5.helm基本使用

搜索和下载 Chart
#查看阿里云 chart 仓库中的 memcached
helm search repo aliyun |grep memcached

#查看 chart 信息
helm show chart aliyun/memcached

#下载 chart 包到本地
helm pull aliyun/memcached

#安装chart
helm install aliyun memcached -n namespace

十一、安装ingress

ingress负载七层负载均衡,如果使用到需要安装

方式一:通过yaml文件安装

1.从github获取deploy.yam文件

wget https://raw.githubusercontent.com/kubernetes/ingress-nginx/main/deploy/static/provider/baremetal/1.19/deploy.yaml
#deploy.yaml文件里的镜像国内无法下载,需要修改为国内镜像
vi deploy.yaml
#需要修改的镜像有三处,将image的值改为如下值:
k8s.gcr.io/ingress-nginx/controller:v1.1.1(第一处)
anjia0532/google-containers.ingress-nginx.controller:v1.1.1

k8s.gcr.io/ingress-nginx/kube-webhook-certgen:v1.1.1(第二和第三处)
anjia0532/google-containers.ingress-nginx.kube-webhook-certgen:v1.1.1 

#部署
kubectl apply -f deploy.yaml
# 检查安装的结果
kubectl get pod,svc -n ingress-nginx

# 最后别忘记把svc暴露的端口要放行

方式二:使用helm安装

1.搜索chart
[root@master1 ~]# helm search repo bitnami |grep ingress
bitnami/contour                                 7.3.11          1.20.1          Contour is an open source Kubernetes ingress co...
bitnami/nginx-ingress-controller                9.1.9           1.1.1           NGINX 

2.拉取ingress的chart,这里我用的是bitnami 仓库的nginx-ingress-controller
helm pull bitnami nginx-ingress-controller

3.安装chart(ingress-nginx 在k8s提前建好)
helm install  bitnami nginx-ingress-controller -n ingress-nginx

4.# 检查安装的结果
kubectl get pod,svc -n ingress-nginx

十二、安装metrics-server

1.下载yaml

wget https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

2.修改yaml

镜像修改为:bitnami/metrics-server
添加一行tls验证
spec:
      containers:
      - args:
        .........
        .........
        - --kubelet-insecure-tls 
        image: bitnami/metrics-server

3.部署

kubectl apply -f components.yaml

4.查看

kubectl top nod
kubectl top pod

十三、部署storageclass

前提:nfs服务器(可参考之前的nfs部署)

创建共享目录
mkdir /nfs/k8s && chmod 777 /nfs/k8s
编辑共享目录配置
vim /etc/exports 
/nfs/k8s * (rw,async,no_subtree_check)
使配置生效
exportfs  -r
查看网络上可用的NFS服务
showmount -e 192.168.15.139

1.使用RBAC进行授权

为nfs-client-provisioner创建一个serviceAccount,然后绑定上对应的权限(rbac.yaml)

apiVersion: v1
kind: ServiceAccount
metadata:
  name: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: storgeclass
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: nfs-client-provisioner-runner
rules:
  - apiGroups: [""]
    resources: ["persistentvolumes"]
    verbs: ["get", "list", "watch", "create", "delete"]
  - apiGroups: [""]
    resources: ["persistentvolumeclaims"]
    verbs: ["get", "list", "watch", "update","create"]
  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses"]
    verbs: ["get", "list", "watch"]
  - apiGroups: [""]
    resources: ["events"]
    verbs: ["create", "update", "patch"]
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: run-nfs-client-provisioner
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: storgeclass
roleRef:
  kind: ClusterRole
  name: nfs-client-provisioner-runner
  apiGroup: rbac.authorization.k8s.io
---
kind: Role
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: storgeclass
rules:
  - apiGroups: [""]
    resources: ["endpoints"]
    verbs: ["get", "list", "watch", "create", "update", "patch"]
---
kind: RoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
  name: leader-locking-nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: storgeclass
subjects:
  - kind: ServiceAccount
    name: nfs-client-provisioner
    # replace with namespace where provisioner is deployed
    namespace: storgeclass
roleRef:
  kind: Role
  name: leader-locking-nfs-client-provisioner
  apiGroup: rbac.authorization.k8s.io

kubectl apply -f rbac.yaml

2.创建Deployment(nfs-client-provisioner.yam)

注意修改nfs服务端IP和共享目录

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nfs-client-provisioner
  labels:
    app: nfs-client-provisioner
  # replace with namespace where provisioner is deployed
  namespace: storgeclass
spec:
  replicas: 1
  strategy:
    type: Recreate
  selector:
    matchLabels:
      app: nfs-client-provisioner
  template:
    metadata:
      labels:
        app: nfs-client-provisioner
    spec:
      serviceAccountName: nfs-client-provisioner
      containers:
        - name: nfs-client-provisioner
          image: quay.io/external_storage/nfs-client-provisioner:latest
          volumeMounts:
            - name: nfs-client-root
              mountPath: /persistentvolumes
          env:
            - name: PROVISIONER_NAME
              value: fuseim.pri/ifs
            - name: NFS_SERVER
              value: 192.168.15.139
            - name: NFS_PATH
              value: /nfs/k8s
      volumes:
        - name: nfs-client-root
          nfs:
            server: 192.168.15.139
            path: /nfs/k8s

kubectl apply -f nfs-client.yam

3.创建默认存储类(storageclass.yaml)

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: managed-nfs-storage
  annotations:
    storageclass.kubernetes.io/is-default-class: "true"
provisioner: fuseim.pri/ifs # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
  archiveOnDelete: "false"

kubectl apply -f storageclass.yaml

4.验证

[root@master1 storgeclass]# kubectl get pod -n storgeclass
NAME                                      READY   STATUS    RESTARTS   AGE
nfs-client-provisioner-57b584586b-v4xz7   1/1     Running   0          4h1m


[root@master1 storgeclass]# kubectl get sc
NAME                            PROVISIONER      RECLAIMPOLICY   VOLUMEBINDINGMODE   ALLOWVOLUMEEXPANSION   AGE
managed-nfs-storage (default)   fuseim.pri/ifs   Delete          Immediate           false                  178m


十四、部署kubesphere

部署kubesphere前需要先部署一个storageclass(参考storagephere部署)

1.部署

kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.2.1/kubesphere-installer.yaml
   
kubectl apply -f https://github.com/kubesphere/ks-installer/releases/download/v3.2.1/cluster-configuration.yaml

2.编辑cluster-configuration.yaml

endpointIps上的localhost改为对应的etcd地址

  endpointIps: 192.168.15.139  # etcd cluster EndpointIps. It can be a bunch of IPs here.
  port: 2379              # etcd port.

开启插拔式插件可参考官方文档:https://kubesphere.com.cn/docs/pluggable-components/devops/

3.检查安装日志

kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l app=ks-install -o jsonpath='{.items[0].metadata.name}') -f

在这里插入图片描述

出现这个界面则成功

4.访问kubesphere

Console: http://192.168.15.139:30880
Account: admin
Password: P@88w0rd

在这里插入图片描述

附录:

1.为 Kubernetes 项目生成对外提供服务时所需的证书文件,都放在 Master 节点的 /etc/kubernetes/pki 目录下。

比如:用户使用 kubectl 获取容器日志等 streaming 操作时,需要通过 kube-apiserver 向 kubelet 发起请求,这个连接也必须是安全的。kubeadm 为这一步生成的是 apiserver-kubelet-client.crt 文件,对应的私钥是 apiserver-kubelet-client.key。

完整的证书文件如下,其中以.key结尾的都是私钥文件:

apiserver.crt
apiserver-etcd-client.crt
apiserver-etcd-client.key   
apiserver.key
apiserver-kubelet-client.crt  # kube-apiserver 向kubelet发起请求使用(比如kubectl获取容器日志等操作)
apiserver-kubelet-client.key  # apiserver-kubelet-client.crt的私钥
ca.crt   #最主要的证书
ca.key   # ca.crt的私钥
etcd     # 这是一个目录
front-proxy-ca.crt
front-proxy-ca.key
front-proxy-client.crt
front-proxy-client.key
sa.key
sa.pub

在目录/etc/kubernetes/生成配置文件,配置文件中记录当前这个 Master 节点的服务器地址、监听端口、证书目录等信息, 以便 kubelet、kubectl和scheduler可以直接加载相应的conf文件使用里面的信息,来与 API 服务器建立安全的连接。

同时生成一个名为 admin.conf 的独立的 kubeconfig 文件,用于管理操作。剩下的过程请参考官方文档。

# 完整的配置文件列表
[root@master01 kubernetes]# ll /etc/kubernetes/|grep conf|awk '{print $NF}'
admin.conf
controller-manager.conf
kubelet.conf
scheduler.conf

1.k8s删除命名空间出现The resource may continue to run on the cluster indefinitely

kubesphere-monitoring-federated   Terminating   7h28m
并且用
 kubectl delete nskubesphere-monitoring-federated --force --grace-period=0
也无法删除时
 kubectl edit ns kubesphere-monitoring-federated
 把
  finalizers:
  - finalizers.kubesphere.io/namespaces
去掉则可正常删除

2.k8s存储类(storageclass)动态创建pv失败

背景:安装kubesphere后发现prometheus的pod创建一直有问题

kubectl get pod -n kubesphere-monitoring-system
.......
prometheus-k8s-0                                   0/2     Pending   0          3h47m
prometheus-k8s-1                                   0/2     Pending   0          3h47m
..........

查看日志报错

pod has unbound immediate PersistentVolumeClaims

查看pvc发现一直是Pending

kubectl get pvc  -n kubesphere-monitoring-system
.........
prometheus-k8s-db-prometheus-k8s-0   Pending              managed-nfs-storage   16h
prometheus-k8s-db-prometheus-k8s-1   Pending              managed-nfs-storage   16h
.........

再查看pv发现没有被自动创建

kubectl get pv -n kubesphere-monitoring-system

查看nfs-client-provisioner日志发现报错

kubectl logs -n storgeclass nfs-client-provisioner-57b584586b-v4xz7
.........
 unexpected error getting claim reference: selfLink was empty, can't make reference
.........

原因:

elfLink was empty 在k8s集群 v1.20之前都存在,在v1.20之后被删除,需要在/etc/kubernetes/manifests/kube-apiserver.yaml 添加参数
增加 - --feature-gates=RemoveSelfLink=false

vim /etc/kubernetes/manifests/kube-apiserver.yaml 
spec:
containers:
- command:
   ...........
    - kube-apiserver
    - --feature-gates=RemoveSelfLink=false #添加内容
   ..............
kubectl apply -f /etc/kubernetes/manifests/kube-apiserver.yaml

到此问题解决

3.解决监控异常问题:

kubectl get pod -n kubesphere-monitoring-system
prometheus-k8s-0                                   0/2     Pending   0          3h47m
prometheus-k8s-1                                   0/2     Pending   0          3h47m
普罗米修斯这个pod创建一直有问题
kubectl describe pod prometheus-k8s-0 -n kubesphere-monitoring-system
Warning  FailedScheduling  94s (x237 over 3h57m)  default-scheduler  0/3 nodes are available: 3 pod has unbound immediate PersistentVolumeClaims.

是因为缺失证书

#监控证书位置
ps -ef | grep kube-apiserver
...........
 --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key 
...........

解决问题

kubectl -n kubesphere-monitoring-system create secret generic kube-etcd-client-certs  --from-file=etcd-client-ca.crt=/etc/kubernetes/pki/etcd/ca.crt  --from-file=etcd-client.crt=/etc/kubernetes/pki/apiserver-etcd-client.crt  --from-file=etcd-client.key=/etc/kubernetes/pki/apiserver-etcd-client.key

4.安装calico网络插件后K8s集群节点间通信找不到主机路由(no route to host)

背景:k8s安装calico网络插件后master节点ping不通其它node节点,但可以ping通外网,同时calico有一个pod启动异常,日志报错信息calico/node is not ready: BIRD is not ready: BGP not established with 192.168.8.xxx,192.168.8.xxx

[root@master1 ~]# ping 192.168.8.131
connect: No route to host
[root@master1 ~]# ping 192.168.8.132
connect: No route to host

节点一会处于Ready状态,一会处于NotReady状态

[root@master1 ~]# kubectl get nodes
NAME      STATUS     ROLES                  AGE    VERSION
master1   Ready    control-plane,master      22d   v1.23.4
node1     NotReady    <none>                 22d   v1.23.4
node2     NotReady    <none>                 22d   v1.23.4

最后排查到的原因

Pod CIDR与节点IP冲突,Calico的Pod CIDR--pod-network-cidr默认使用的是192.168.0.0/16,而当集群节点的IP段也为192.168.0.0/16时,必然导致IP段冲突

当Pod子网和主机网络出现冲突的情况下就会出现问题。节点与节点,Pod与Pod之间通信会因为路由问题被中断。仔细检查网络设置,确保Pod CIDRVLANVPC之间不会有重叠。如果有冲突的,我们可以在CNI插件或kubelet的pod-cidr参数中指定 IP 地址范围,避免冲突。

解决方案

重新配置Calico的Pod CIDR

 vim calico.yaml
..............
- name: CALICO_IPV4POOL_CIDR
  #value: "192.168.0.0/16"
  value: "172.16.0.0/16"
..............
kubectl delete -f calico.yaml
kubectl apply -f calico.yaml

到此,calico所有pod成功启动,节点间可以相互ping通,问题解决
nect: No route to host


[root@master1 ~]# ping 192.168.8.132
connect: No route to host


节点一会处于Ready状态,一会处于NotReady状态

[root@master1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master1 Ready control-plane,master 22d v1.23.4
node1 NotReady 22d v1.23.4
node2 NotReady 22d v1.23.4


最后排查到的原因

> `Pod CIDR`与节点IP冲突,Calico的`Pod CIDR`即`--pod-network-cidr`默认使用的是`192.168.0.0/16`,而当集群节点的IP段也为`192.168.0.0/16`时,必然导致IP段冲突

当Pod子网和主机网络出现冲突的情况下就会出现问题。节点与节点,Pod与Pod之间通信会因为路由问题被中断。仔细检查网络设置,确保`Pod CIDR`、`VLAN`或`VPC`之间不会有重叠。如果有冲突的,我们可以在CNI插件或kubelet的`pod-cidr`参数中指定 IP 地址范围,避免冲突。

解决方案

重新配置Calico的Pod CIDR

vim calico.yaml

  • name: CALICO_IPV4POOL_CIDR
    #value: “192.168.0.0/16”
    value: “172.16.0.0/16”

kubectl delete -f calico.yaml
kubectl apply -f calico.yaml


到此,calico所有pod成功启动,节点间可以相互ping通,问题解决
Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐