centos7.9部署k8s-docker-ce

查看flannel pod状态（必须要为Running状态，如果kube-flannel起不来，那么就用kubectl describe pod kube-flannel-ds-f5jn6 -n kube-flannel命令查看pod起不来的原因，在kubesphere 集群管理--> 定制资源管理-->ClusterConfiguration -->ks-install -->编辑YAML文件

csc278365214

1253人浏览 · 2024-03-20 17:43:54

csc278365214 · 2024-03-20 17:43:54 发布

节点1：node1 192.168.88.21
节点2：node2 192.168.88.22
节点2：node2 192.168.88.23

Docker： version 20.10.9 （不能高于20版本）
kubectl： v1.23.0
kubesphere v3.3.0
Kubernetes和Docker的主版本号应该保持一致。例如，如果使用Kubernetes v1.18，则应该使用Docker v18.x。

####################################################################### 内核升级
内核 3.10.0-1160.108.1.el7.x86_64 升级至少4.0版本以上

rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
rpm -Uvh https://www.elrepo.org/elrepo-release-7.el7.elrepo.noarch.rpm
yum --enablerepo=elrepo-kernel install kernel-ml-devel kernel-ml-headers kernel-ml -y
grub2-set-default 0
grub2-mkconfig -o /boot/grub2/grub.cfg
reboot
uname -sr

# 升级完成后查看内核，已经满足条件
[root@k8s-a-node10 ~]# uname -sr
Linux 6.2.9-1.el7.elrepo.x86_64

# 看看k8s是否正常，我的没问题，非常顺利。
kubectl get nodes
kubectl get pod -n kube-system

/usr/sbin/modprobe rbd
echo "/usr/sbin/modprobe rbd " >> /etc/rc.local
chmod -R 755 /etc/rc.d/rc.local

#######################################################################

参考网址： https://zhuanlan.zhihu.com/p/627310856

docker对systemd的版本要求，在centos7环境下，systemd好像是219版本，我出问题的环境也是219版本
可以重启一下服务不要先重启kubelet，先重启kubelet会导致所有的pod处于pending状态！
systemctl restart docker
systemctl restart kubelet

#############

设置hostname（以node1为例）：

设置dnsname :(aliyun)
nameserver 223.5.5.5
nameserver 223.6.6.6

hostnamectl set-hostname node1 # node1 是自定义名字
或者修改 /etc/hostname 文件，写入node1（其他的子节点都一样）：

vim /etc/hostname
修改之后/etc/hostname的内容为：

node1
所有节点执行时间同步：

# 启动chronyd 时间同步服务
systemctl start chronyd
systemctl enable chronyd
date
所有节点禁用SELinux和Firewalld服务：

systemctl stop firewalld
systemctl disable firewalld

sed -i 's/enforcing/disabled/' /etc/selinux/config # 重启后生效
所有节点禁用swap分区：

# 临时禁用swap分区
swapoff -a

# 永久禁用swap分区
vi /etc/fstab
# 注释掉下面的设置
# /dev/mapper/centos-swap swap
# 之后需要重启服务器生效
所有节点添加网桥过滤和地址转发功能：

cat > /etc/sysctl.d/kubernetes.conf << EOF
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

# 然后执行,生效
sysctl --system

##########################################################################################

在线安装docker-ce

yum install -y yum-utils \
device-mapper-persistent-data \
lvm2
###--skip-broken

# 设置docker镜像源
yum-config-manager \
--add-repo \
https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

sed -i 's/download.docker.com/mirrors.aliyun.com\/docker-ce/g' /etc/yum.repos.d/docker-ce.repo

yum makecache fast

#docker 与k8s 版本需要注意一下对应关系请参考 https://blog.csdn.net/qq_42910468/article/details/126037954
#kubelet-1.23.0 # docker 版本不能高于 20
#docker-ce-20.10.9
#yum list docker-ce --showduplicates | sort -r

rpm -e docker-buildx-plugin-0:0.12.1-1.el7.x86_64

yum install -y bash-completion nfs-utils
yum install -y docker-ce-20.10.9 docker-ce-cli-20.10.9 docker-compose-plugin-2.20.2

###docker-ce 3:25.0.3-1.el7 docker-ce-stable 版本过高不建议

##########################################################################################

需要注意的是要配置docker的cgroupdriver：

mkdir -p /etc/docker/
sudo tee /etc/docker/daemon.json <<-'EOF'
{
"registry-mirrors": ["https://82m9ar63.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m"
},
"storage-driver": "overlay2"
}
EOF

systemctl daemon-reload
systemctl restart docker
systemctl enable docker.service

##########################################################################################
所有节点的kubernetes镜像切换成国内源：

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
# 是否开启本仓库
enabled=1
# 是否检查 gpg 签名文件
gpgcheck=0
# 是否检查 gpg 签名文件
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg

EOF

#######################################################
所有节点安装指定版本 kubeadm，kubelet 和 kubectl（我这里选择1.23.0版本的）：

yum install -y kubelet-1.23.0 kubeadm-1.23.0 kubectl-1.23.0

# 设置kubelet开机启动（看你自己）
systemctl enable kubelet

##########################################################################################
1.2 *更改kubelet的容器路径（如果需要的话，不需要可以跳过）
vim /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
修改完之后配置文件如下：

[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --root-dir=/mnt/sdb_new/kubelet/ --kubeconfig=/etc/kubernetes/kubelet.conf"
使配置生效：

systemctl daemon-reload
systemctl restart docker

systemctl restart kubelet

systemctl enable docker
systemctl enable kubelet

##########################################################################################
覆盖kubernetes的镜像地址（只需要在master节点上操作初始化命令）

1. 首先要覆盖kubeadm的镜像地址，因为这个是外网的无法访问，需要替换成国内的镜像地址，使用此命令列出集群在配置过程中需要哪些镜像：

kubeadm config images list
kubeadm config images list --image-repository registry.aliyuncs.com/google_containers

更改为阿里云的镜像地址：

kubeadm init \
--apiserver-advertise-address=192.168.88.21 \
--image-repository registry.aliyuncs.com/google_containers \
--kubernetes-version v1.23.0 \
--service-cidr=10.96.0.0/12 \
--pod-network-cidr=10.244.0.0/16 \
--ignore-preflight-errors=all

# –apiserver-advertise-address # 集群通告地址(master 机器IP，这里用的万兆网)
# –image-repository # 由于默认拉取镜像地址k8s.gcr.io国内无法访问，这里指定阿里云镜像仓库地址
# –kubernetes-version #K8s版本，与上面安装的一致
# –service-cidr #集群内部虚拟网络，Pod统一访问入口，可以不用更改，直接用上面的参数
# –pod-network-cidr #Pod网络，与下面部署的CNI网络组件yaml中保持一致，可以不用更改，直接用上面的参数

kubeadm config images list

《========================================Your Kubernetes control-plane has initialized successfully! 《========================================

To start using your cluster, you need to run the following as a regular user:

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

#这段要复制记录下来（来自k8s初始化成功之后出现的join命令，需要先配置完Flannel才能加入子节点），
#后续子节点加入master节点需要执行这段命令：

kubeadm join 192.168.88.21:6443 --token 5ftb6m.79xz124nx3n4u69v \
--discovery-token-ca-cert-hash sha256:bab814a71242fec19f3f693038be05656698c3c6d4054657b52ca7d8e3b9138f

### root 账户需要配置以下命令：
需要先安装如下：
yum install bash-completion

vi /root/.bash_profile
加入以下这段：

# 超级用户变量
export KUBECONFIG=/etc/kubernetes/admin.conf
# 设置别名
alias k=kubectl
# 设置kubectl命令补齐功能
source <(kubectl completion bash)

[root@node1 home]# source /root/.bash_profile

这段要复制记录下来（来自k8s初始化成功之后出现的join命令，需要先配置完Flannel才能加入子节点），后续子节点加入master节点需要执行这段命令：

################################################# 设定kubeletl网络（主节点部署） #################################################

下载kube-flannel.yml：

[root@node1 home]# wget https://raw.githubusercontent.com/flannel-io/flannel/master/Documentation/kube-flannel.yml

文件内容如下：

########################
vi kube-flannel.yml

---
kind: Namespace
apiVersion: v1
metadata:
name: kube-flannel
labels:
k8s-app: flannel
pod-security.kubernetes.io/enforce: privileged
---
kind: ClusterRole
apiVersion: rbac.authorization.k8s.io/v1
metadata:
labels:
k8s-app: flannel
name: flannel
rules:
- apiGroups:
- ""
resources:
- pods
verbs:
- get
- apiGroups:
- ""
resources:
- nodes
verbs:
- get
- list
- watch
- apiGroups:
- ""
resources:
- nodes/status
verbs:
- patch
- apiGroups:
- networking.k8s.io
resources:
- clustercidrs
verbs:
- list
- watch
---
kind: ClusterRoleBinding
apiVersion: rbac.authorization.k8s.io/v1
metadata:
labels:
k8s-app: flannel
name: flannel
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: flannel
subjects:
- kind: ServiceAccount
name: flannel
namespace: kube-flannel
---
apiVersion: v1
kind: ServiceAccount
metadata:
labels:
k8s-app: flannel
name: flannel
namespace: kube-flannel
---
kind: ConfigMap
apiVersion: v1
metadata:
name: kube-flannel-cfg
namespace: kube-flannel
labels:
tier: node
k8s-app: flannel
app: flannel
data:
cni-conf.json: |
{
"name": "cbr0",
"cniVersion": "0.3.1",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend": {
"Type": "vxlan"
}
}
---
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: kube-flannel-ds
namespace: kube-flannel
labels:
tier: node
app: flannel
k8s-app: flannel
spec:
selector:
matchLabels:
app: flannel
template:
metadata:
labels:
tier: node
app: flannel
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: kubernetes.io/os
operator: In
values:
- linux
hostNetwork: true
priorityClassName: system-node-critical
tolerations:
- operator: Exists
effect: NoSchedule
serviceAccountName: flannel
initContainers:
- name: install-cni-plugin
image: docker.io/flannel/flannel-cni-plugin:v1.4.0-flannel1
command:
- cp
args:
- -f
- /flannel
- /opt/cni/bin/flannel
volumeMounts:
- name: cni-plugin
mountPath: /opt/cni/bin
- name: install-cni
image: docker.io/flannel/flannel:v0.24.2
command:
- cp
args:
- -f
- /etc/kube-flannel/cni-conf.json
- /etc/cni/net.d/10-flannel.conflist
volumeMounts:
- name: cni
mountPath: /etc/cni/net.d
- name: flannel-cfg
mountPath: /etc/kube-flannel/
containers:
- name: kube-flannel
image: docker.io/flannel/flannel:v0.24.2
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
resources:
requests:
cpu: "100m"
memory: "50Mi"
securityContext:
privileged: false
capabilities:
add: ["NET_ADMIN", "NET_RAW"]
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: EVENT_QUEUE_DEPTH
value: "5000"
volumeMounts:
- name: run
mountPath: /run/flannel
- name: flannel-cfg
mountPath: /etc/kube-flannel/
- name: xtables-lock
mountPath: /run/xtables.lock
volumes:
- name: run
hostPath:
path: /run/flannel
- name: cni-plugin
hostPath:
path: /opt/cni/bin
- name: cni
hostPath:
path: /etc/cni/net.d
- name: flannel-cfg
configMap:
name: kube-flannel-cfg
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate

#############################

然后修改配置文件，找到如下位置，修改 Newwork 与执行 kubeadm init 输入的网段一致：

net-conf.json: |
{
"Network": "10.244.0.0/16",
"Backend"": {
"Type": "vxlan"
}
}

修改配置之后安装组件（如果安装的时候卡在pull镜像的时候，试一试手动用docker将镜像拉取下来）：

[root@node1 home]# kubectl apply -f kube-flannel.yml

查看flannel pod状态（必须要为Running状态，如果kube-flannel起不来，那么就用kubectl describe pod kube-flannel-ds-f5jn6 -n kube-flannel命令查看pod起不来的原因，
然后去搜度娘获取解决方案）：

[root@node1 home]# # 必须所有的容器都是Running
[root@node1 home]# kubectl get pod --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-flannel kube-flannel-ds-f5jn6 1/1 Running 0 8m21s
kube-system coredns-6d8c4cb4d-ctqw5 1/1 Running 0 42m
kube-system coredns-6d8c4cb4d-n52fq 1/1 Running 0 42m
kube-system etcd-k8s-master 1/1 Running 0 42m
kube-system kube-apiserver-k8s-master 1/1 Running 0 42m
kube-system kube-controller-manager-k8s-master 1/1 Running 0 42m
kube-system kube-proxy-swpkz 1/1 Running 0 42m
kube-system kube-scheduler-k8s-master 1/1 Running 0 42m
查看通信状态：

[root@node1 home]# kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6d8c4cb4d-ctqw5 1/1 Running 0 52m
coredns-6d8c4cb4d-n52fq 1/1 Running 0 52m
etcd-k8s-master 1/1 Running 0 53m
kube-apiserver-k8s-master 1/1 Running 0 53m
kube-controller-manager-k8s-master 1/1 Running 0 53m
kube-proxy-swpkz 1/1 Running 0 52m
kube-scheduler-k8s-master 1/1 Running 0 53m

[root@node1 home]# 获取主节点的状态
[root@node1 home]# kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME STATUS MESSAGE ERROR
controller-manager Healthy ok
scheduler Healthy ok
etcd-0 Healthy {"health":"true","reason":""}
[root@node1 home]# kubectl get node
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 52m v1.23.0
查看节点状态（此时还只有主节点，还没添加子节点）：

[root@node1 home]# kubectl get node
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 53m v1.23.0
至此 K8s master主服务器已经部署完成！

################################################# 1.3.4 子节点加入集群（在子节点上操作） #################################################

初始化会生成join命令，需要在子节点执行即可，以下token作为举例，以实际为主，例如：

[root@node2 home]# kubeadm join 192.168.88.21:6443 --token 5ftb6m.79xz124nx3n4u69v \
--discovery-token-ca-cert-hash sha256:bab814a71242fec19f3f693038be05656698c3c6d4054657b52ca7d8e3b9138f

[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
默认的 join token 有效期限为24小时，当过期后该 token 就不能用了，这时需要重新创建 token，创建新的join token需要在主节点上创建，创建命令如下：

[root@node1 home]# kubeadm token create --print-join-command
加入之后再在主节点查看集群中节点的状态（必须要都为Ready状态）：

[root@node1 home]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 Ready control-plane,master 63m v1.23.0
node2 Ready <none> 3m57s v1.23.0
node3 Ready <none> 29s v1.23.0

如果所有的节点STATUS都为Ready的话，那么到此，所有的子节点加入完成！

+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
## 在子节点需要执行 kubectl get nodes 查询命令：
执行如下命令接入集群:
kubeadm join 192.168.88.115:6443 --token gzay1h.1u0n8ugcs9adk1f0 \
--discovery-token-ca-cert-hash sha256:94406ea0dba5d588f37c9ba9ffc8a3585f8526f37763ab0be002e129b9f9022b

kubectl get nodes

在主节点上面执行命令把/etc/kubernetes/admin.conf 传送到其他节点
1、对于任何节点上执行报错
[root@server-88-22 ~]# kubectl get nodes
The connection to the server localhost:8080 was refused - did you specify the right host or port?

在各个非主节点执行：

scp /etc/kubernetes/admin.conf user@host:/etc/kubernetes/admin.conf

user为主机登录用户
host为主机ip
然后执行：

echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> ~/.bash_profile
source ~/.bash_profile

在子节点上执行命令kubectl get nodes：查看节点状态

[root@node1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
node1 NotReady control-plane,master 27m v1.23.0
node2 NotReady <none> 10m v1.23.0

################################################# 删除子节点 #################################################

1.3.5 删除子节点（在master主节点上操作）
# kubectl drain <node name> --delete-local-data --force --ignore-daemonsets
# 其中 <node name> 是在k8s集群中使用 <kubectl get nodes> 查询到的节点名称
# 假设这里删除 node3 子节点
[root@node1 home]# kubectl drain node3 --delete-local-data --force --ignore-daemonsets
[root@node1 home]# kubectl delete node node3
然后在删除的子节点上操作重置k8s（重置k8s会删除一些配置文件），这里在node3子节点上操作：

[root@node3 home]# # 子节点重置k8s
[root@node3 home]# kubeadm reset
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W0425 01:59:40.412616 15604 removeetcdmember.go:80] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/var/lib/kubelet /var/lib/dockershim /var/run/kubernetes /var/lib/cni]

The reset process does not clean CNI configuration. To do so, you must remove /etc/cni/net.d

The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.

If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.

The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
然后在被删除的子节点上手动删除k8s配置文件、flannel网络配置文件和 flannel网口：

[root@node3 home]# rm -rf /etc/cni/net.d/
[root@node3 home]# rm -rf /root/.kube/config
[root@node3 home]# # 删除cni网络
[root@node3 home]# ifconfig cni0 down
[root@node3 home]# ip link delete cni0
[root@node3 home]# ifconfig flannel.1 down
[root@node3 home]# ip link delete flannel.1

###############
命令笔记：
k8s常用命令集合：

# 查看当前集群的所有的节点
kubectl get node
# 显示 Node 的详细信息（一般用不着）
kubectl describe node node1

# 查看所有的pod
kubectl get pod --all-namespaces
# 查看pod的详细信息
kubectl get pods -o wide --all-namespaces

# 查看所有创建的服务
kubectl get service

# 查看所有的deploy
kubectl get deploy

# 重启 pod（这个方式会删除原来的pod，然后再重新生成一个pod达到重启的目的）
# 有yaml文件的重启
kubectl replace --force -f xxx.yaml
# 无yaml文件的重启
kubectl get pod <POD_NAME> -n <NAMESPACE> -o yaml | kubectl replace --force -f -

# 查看pod的详细信息
kubectl describe pod nfs-client-provisioner-65c77c7bf9-54rdp -n default

# 根据 yaml 文件创建Pod资源用于创建或更新一个 Kubernetes 对象 apply 还提供了许多可选的参数，例如 --force、--validate、--record 等，可以使更新操作更加精确和可控
kubectl apply -f pod.yaml

# kubectl create -f 适用于初始化资源对象的场景；
用于创建 Kubernetes 对象。如果对应的资源已经存在，则会返回错误，此时需要先删除原有的资源对象，然后再执行创建操作。如果资源对象不存在，则会自动创建对应的资源对象

# 删除基于 pod.yaml 文件定义的Pod
kubectl delete -f pod.yaml

# 查看容器的日志
kubectl logs <pod-name>
# 实时查看日志
kubectl logs -f <pod-name>
# 若 pod 只有一个容器，可以不加 -c
kubectl log <pod-name> -c <container_name>
# 返回所有标记为 app=frontend 的 pod 的合并日志
kubectl logs -l app=frontend

# 通过bash获得 pod 中某个容器的TTY，相当于登录容器
# kubectl exec -it <pod-name> -c <container-name> -- bash
eg:
kubectl exec -it redis-master-cln81 -- bash

# 查看 endpoint 列表
kubectl get endpoints

# 查看已有的token
kubeadm token list

################################################# 安装动态存储 #################################################

原文链接：https://blog.csdn.net/m0_51510236/article/details/132641343
kubesphere/ks-installer:v3.3.0
kubectl v1.23.0
docker: 20.10.9

您的 Kubernetes 版本必须为：v1.20.x、v1.21.x、* v1.22.x、* v1.23.x、* v1.24.x、* v1.25.x 和 * v1.26.x。带星号的版本可能出现边缘节点部分功能不可用的情况。因此，如需使用边缘节点，推荐安装 v1.21.x。
确保您的机器满足最低硬件要求：CPU > 1 核，内存 > 2 GB。
在安装之前，需要配置 Kubernetes 集群中的默认存储类型（这篇文章会介绍安装）。

我已经准备好了一个Kubernetes集群，如图
[root@server-88-21 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
server-88-21 Ready control-plane,master 26h v1.23.0 192.168.88.21 <none> CentOS Linux 7 (Core) 3.10.0-1160.108.1.el7.x86_64 docker://20.10.9
server-88-22 Ready <none> 26h v1.23.0 192.168.88.22 <none> CentOS Linux 7 (Core) 3.10.0-1160.108.1.el7.x86_64 docker://20.10.9

####NFS动态供给
首先你需要准备一台NFS服务器，为了方便，我这次就以我的主服务器 k8s-master 来担任这个NFS服务器了。

##搭建NFS

首先我们需要在NFS服务器(我的NFS服务器和master是同一台)和所有k8s节点当中安装 nfs-utils 软件包（master和node都需要安装），可执行下面这行命令：

yum install -y nfs-utils

# 创建这个目录
mkdir -p /data/nfs/dynamic-provisioner
# 执行这行命令将这个目录写到写到 /etc/exports 文件当中去，这样NFS会对局域网暴露这个目录
cat >> /etc/exports << EOF
/data/k8s *(rw,sync,no_root_squash)
EOF
# 启动NFS服务
systemctl enable nfs
systemctl start nfs

检查是否暴露成功：(其他节点也需要测试一下)
showmount -e {nfs服务器地址}

###下载动态供给驱动
因为Kubernetes自己不自带NFS动态供给的驱动，所以我们需要下载第三方的NFS动态供给驱动。Kubernetes官方推荐了两个第三方的驱动可供选择，如图：
个人觉得这个 NFS subdir 驱动比较好用，这次就用这个驱动来搭建动态供给了。我们可以来到它的官网
wget https://github.com/kubernetes-sigs/nfs-subdir-external-provisioner/archive/refs/tags/nfs-subdir-external-provisioner-4.0.18.tar.gz

cd nfs-subdir-external-provisioner-nfs-subdir-external-provisioner-4.0.18/deploy/

可以看到这里面有一些yaml，我们需要修改一部分：
# 这个镜像是在谷歌上的，国内拉取不到
# image: registry.k8s.io/sig-storage/nfs-subdir-external-provisioner:v4.0.2
# 使用这个我先在谷歌上拉取下来再上传到阿里云上的镜像
image: registry.cn-shenzhen.aliyuncs.com/xiaohh-docker/nfs-subdir-external-provisioner:v4.0.2

###deployment.yaml 注意修改 image地址和 NFS 的IP地址和路径
[root@server-88-21 deploy]# cat deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: nfs-client-provisioner
labels:
app: nfs-client-provisioner
# replace with namespace where provisioner is deployed
namespace: nfs-provisioner
spec:
replicas: 1
strategy:
type: Recreate
selector:
matchLabels:
app: nfs-client-provisioner
template:
metadata:
labels:
app: nfs-client-provisioner
spec:
serviceAccountName: nfs-client-provisioner
containers:
- name: nfs-client-provisioner
image: registry.cn-shenzhen.aliyuncs.com/xiaohh-docker/nfs-subdir-external-provisioner:v4.0.2
volumeMounts:
- name: nfs-client-root
mountPath: /persistentvolumes
env:
- name: PROVISIONER_NAME
value: k8s-sigs.io/nfs-subdir-external-provisioner
- name: NFS_SERVER
value: 192.168.88.21
- name: NFS_PATH
value: /data/k8s
volumes:
- name: nfs-client-root
nfs:
server: 192.168.88.21
path: /data/k8s

#如果你只打算安装动态供给的存储类，可以不进行默认存储配置，如果第一存储是 NFS 则需要进行默认存储配置
## 修改 nfs-client 为默认存储见如下配置文件的：
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"

[root@server-88-21 deploy]# more class.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: nfs-client
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"
provisioner: k8s-sigs.io/nfs-subdir-external-provisioner # or choose another name, must match deployment's env PROVISIONER_NAME'
parameters:
archiveOnDelete: "false"
volumeBindingMode: WaitForFirstConsumer
reclaimPolicy: Retain
mountOptions:
- hard
- nointr
- nosuid
- rsize=512
- wsize=512
- timeo=600
- retrans=3

执行下面这一段脚本我们可以看到还是有很多资源是存放在默认命名空间下：

yamls=$(grep -rl 'namespace: default' ./)
for yaml in ${yamls}; do
echo ${yaml}
cat ${yaml} | grep 'namespace: default'
done

我们可以新创建一个命名空间专门装这个驱动，也方便以后管理，所以我决定创建一个名为 nfs-provisioner 命名空间，为了方便就不用yaml文件了，直接通过命令创建：

kubectl create namespace nfs-provisioner

执行后可以看到这个命名空间创建成功：
[root@server-88-21 deploy]# kubectl get namespace
NAME STATUS AGE
default Active 27h
kube-flannel Active 26h
kube-node-lease Active 27h
kube-public Active 27h
kube-system Active 27h
kubesphere-controls-system Active 21h
kubesphere-monitoring-federated Active 21h
kubesphere-monitoring-system Active 21h
kubesphere-system Active 110m
nfs-provisioner Active 21h

涉及命名空间这个配置的文件还挺多的，所以我们干脆通过一行脚本更改所有：

sed -i 's/namespace: default/namespace: nfs-provisioner/g' `grep -rl 'namespace: default' ./`

#####安装动态供给
之前我们已经修改好了所有的yaml资源清单文件，接下来我们直接执行安装。安装也是非常简单，直接通过下面一行命令就可以安装完成：

kubectl apply -k .

可以执行下面这个行命令查看是否部署完成：（检查status是否是Running）

kubectl get all -o wide -n nfs-provisioner
[root@server-88-21 deploy]# kubectl get all -o wide -n nfs-provisioner

NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/nfs-client-provisioner-94bcc8884-kstqp 1/1 Running 1 (95m ago) 124m 10.244.1.50 server-88-22 <none> <none>

NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
deployment.apps/nfs-client-provisioner 1/1 1 1 124m nfs-client-provisioner registry.cn-shenzhen.aliyuncs.com/xiaohh-docker/nfs-subdir-external-provisioner:v4.0.2 app=nfs-client-provisioner

NAME DESIRED CURRENT READY AGE CONTAINERS IMAGES SELECTOR
replicaset.apps/nfs-client-provisioner-94bcc8884 1 1 1 124m nfs-client-provisioner registry.cn-shenzhen.aliyuncs.com/xiaohh-docker/nfs-subdir-external-provisioner:v4.0.2 app=nfs-client-provisioner,pod-template-hash=94bcc8884

##可以执行下面命令查询安装的动态供应存储类的名字：（NAME 下面的名称后一定要是 default 要不然ks 找不到这个sc存储）
[root@server-88-21 deploy]# kubectl get storageclass 或者用命令 # kubectl get sc

NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client (default) k8s-sigs.io/nfs-subdir-external-provisioner Retain WaitForFirstConsumer false 125m

##请记住存储的NAME为： nfs-client
Nfs动态供应就已经安装完毕了

### ### ### ### ### ### ### ### ### 安装KubeSphere ### ### ### ### ### ### ### ###

下载KubeSphere的yaml资源清单文件
此次安装的是最新的 v3.4.0 的 KubeSphere，可以通过以下命令下载资源清单文件（共两个）（事实上下载的镜像是 image: kubesphere/ks-installer:v3.3.0）：
wget \
https://github.com/kubesphere/ks-installer/releases/download/v3.4.0/kubesphere-installer.yaml \
https://github.com/kubesphere/ks-installer/releases/download/v3.4.0/cluster-configuration.yaml

其中这两个文件的作用：

kubesphere-installer.yaml: KubeSphere的安装器
cluster-configuration.yaml: KubeSphere的集群配置文件

#####################################################################################################
## 需要修改集群文件里面的 storageClass 为：nfs-client
vi cluster-configuration.yaml
修改第11如下： storageClass: "nfs-client"

如果需要开始devops 需要在78行 79行设置true 开启来
78 devops: # (CPU: 0.47 Core, Memory: 8.6 G) Provide an out-of-the-box CI/CD system based on Jenkins, and automated workflow tools including Source-to-Image & Binary-to-Image.
79 enabled: true # Enable or disable the KubeSphere DevOps System.

####安装KubeSphere
然后我们先创建 kubesphere-installer.yaml 里面的资源：

kubectl apply -f kubesphere-installer.yaml （该文件不需要修改直接用）

然后我们检查这个资源是否创建成功：（如果没有安装集群配置文件，则只会显示 ks-installer-c9655d997-5f4h4 状态为Running）

## 可以采用describe 命令来查看容器信息
kubectl describe pod notification-manager-deployment-7dd45b5b7d-mqrl7 -n kubesphere-monitoring-system

## 可以删除这个pod节点
kubectl delete pod ks-controller-manager-6d6b54464d-jkdjm -n kubesphere-system

kubectl delete pod notification-manager-deployment-7dd45b5b7d-mqrl7 -n kubesphere-monitoring-system

# kubectl get pod -o wide -n kubesphere-system
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
ks-apiserver-66cd784f8f-c9lgk 1/1 Running 0 122m 10.244.0.14 server-88-21 <none> <none>
ks-console-5c5676fb55-jfcdd 1/1 Running 0 122m 10.244.0.13 server-88-21 <none> <none>
ks-controller-manager-6d6b54464d-mrb59 1/1 Running 0 122m 10.244.0.15 server-88-21 <none> <none>
ks-installer-c9655d997-5f4h4 1/1 Running 1 (107m ago) 125m 10.244.1.44 server-88-22 <none> <none>

##接下来我们来执行 cluster-configuration.yaml 文件：
#kubectl apply -f cluster-configuration.yaml

它虽然只有一个资源，但是里面还是要做很多事的：
执行下面命令检查KubeSphere的执行日志：

kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f
需要等待几分钟，安装成功之后输出日志如下：
**************************************************
Waiting for all tasks to be completed ...
task network status is successful (1/4)
task openpitrix status is successful (2/4)
task multicluster status is successful (3/4)
task monitoring status is successful (4/4)
**************************************************
Collecting installation results ...
#####################################################
### Welcome to KubeSphere! ###
#####################################################

Console: http://192.168.88.21:30880
Account: a
dmin
Password: P@88w0rd

NOTES：
1. After you log into the console, please check the
monitoring status of service components in
"Cluster Management". If any service is not
ready, please wait patiently until all components
are up and running.
2. Please change the default password after login.

#####################################################
https://kubesphere.io 2024-02-21 11:51:46
#####################################################

##命令检查查看所有的pod 所有的状态应该是Running

[root@server-88-21 ~]# kubectl get pod -A -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE
++++++++++++++ k8s的网络的NAMESPACE信息
kube-flannel kube-flannel-ds-nxgg7 1/1 Running 3 (113m ago) 27h
kube-flannel kube-flannel-ds-rxmkj 1/1 Running 2 (20h ago) 27h

++++++++++++++ k8s的NAMESPACE信息
kube-system coredns-6d8c4cb4d-6kj9t 1/1 Running 1 (20h ago) 21h
kube-system coredns-6d8c4cb4d-wtdkh 1/1 Running 1 (20h ago) 21h
kube-system etcd-server-88-21 1/1 Running 2 (20h ago) 27h
kube-system kube-apiserver-server-88-21 1/1 Running 2 (20h ago) 27h
kube-system kube-controller-manager-server-88-21 1/1 Running 2 (20h ago) 27h
kube-system kube-proxy-hwh2c 1/1 Running 2 (20h ago) 27h
kube-system kube-proxy-pm6sp 1/1 Running 3 (113m ago) 27h
kube-system kube-scheduler-server-88-21 1/1 Running 2 (20h ago) 27h
kube-system snapshot-controller-0 1/1 Running 1 (113m ago) 20h

++++++++++++++kubesphere的NAMESPACE信息
kubesphere-controls-system default-http-backend-696d6bf54f-5hxf2 1/1 Running 1 (113m ago) 20h
kubesphere-controls-system kubectl-admin-b49cf5585-n6hzd 1/1 Running 1 (113m ago) 124m
kubesphere-monitoring-system alertmanager-main-0 2/2 Running 2 (113m ago) 20h
kubesphere-monitoring-system kube-state-metrics-645c64569c-2tflp 3/3 Running 6 (113m ago) 20h
kubesphere-monitoring-system node-exporter-cmlfk 2/2 Running 4 (20h ago) 21h
kubesphere-monitoring-system node-exporter-rzhts 2/2 Running 5 (113m ago) 21h
kubesphere-monitoring-system notification-manager-deployment-7dd45b5b7d-fdt28 2/2 Running 2 (113m ago) 20h
kubesphere-monitoring-system notification-manager-operator-8598775b-8vnbw 2/2 Running 2 (113m ago) 20h
kubesphere-monitoring-system prometheus-k8s-0 2/2 Running 2 (113m ago) 125m
kubesphere-monitoring-system prometheus-operator-57c78bd7fb-68qnq 2/2 Running 2 (113m ago) 20h
kubesphere-system ks-apiserver-66cd784f8f-c9lgk 1/1 Running 0 128m
kubesphere-system ks-console-5c5676fb55-jfcdd 1/1 Running 0 128m
kubesphere-system ks-controller-manager-6d6b54464d-mrb59 1/1 Running 0 128m
kubesphere-system ks-installer-c9655d997-5f4h4 1/1 Running 1 (113m ago) 131m

+++++++++++++++ NFS动态存储NAMESPACE信息
nfs-provisioner nfs-client-provisioner-94bcc8884-kstqp 1/1 Running 1 (113m ago) 142m

##命令检查查看所有的存储
[root@server-88-21 ~]# kubectl get sc -A
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
nfs-client (default) k8s-sigs.io/nfs-subdir-external-provisioner Retain WaitForFirstConsumer false 143m

## 查看命名空间
[root@server-88-21 k8s]# kubectl get namespace
NAME STATUS AGE
default Active 28h
kube-flannel Active 28h
kube-node-lease Active 28h
kube-public Active 28h
kube-system Active 28h
kubesphere-controls-system Active 22h
kubesphere-monitoring-federated Active 22h
kubesphere-monitoring-system Active 22h
kubesphere-system Active 3h10m
nfs-provisioner Active 23h
test-project Active 34s

### 访问地址： http://192.168.88.22:30880/dashboard （用谷歌浏览器登入可能没有反应，可以使用360浏览器来访问）
默认的用户名/密码是 admin/P@88w0rd 修改的密码为 Whlxhc__2020

################################################# 开启devops 功能- #################################################
##
在kubesphere 集群管理--> 定制资源管理-->ClusterConfiguration -->ks-install -->编辑YAML文件修改 enabled：false 为true ,然后k8s 会自动安装devops 的组件
devops:
enabled: true
jenkinsJavaOpts_MaxRAM: 2g
jenkinsJavaOpts_Xms: 1200m
jenkinsJavaOpts_Xmx: 1600m
jenkinsMemoryLim: 2Gi
jenkinsMemoryReq: 1500Mi
jenkinsVolumeSize: 8Gi

###执行下面命令检查KubeSphere的执行日志：

##kubectl logs -n kubesphere-system $(kubectl get pod -n kubesphere-system -l 'app in (ks-install, ks-installer)' -o jsonpath='{.items[0].metadata.name}') -f

localhost : ok=26 changed=15 unreachable=0 failed=0 skipped=12 rescued=0 ignored=0
Start installing monitoring
Start installing multicluster
Start installing openpitrix
Start installing network
Start installing devops #开始安装 devops 的功能
**************************************************
Waiting for all tasks to be completed ...
task openpitrix status is successful (1/5)
task multicluster status is successful (2/5)
task network status is successful (3/5)
task monitoring status is successful (4/5)

###[root@server-88-22 ~]# kubectl get pod -A #查看空间的启动状态

Helm 版本支持的 Kubernetes 版本

3.8.x 1.23.x - 1.20.x

3.7.x 1.22.x - 1.19.x

3.6.x 1.21.x - 1.18.x

### 报错解决方法：

# 1、修改sc为默认标识
kubectl patch sc local -p '{"metadata": {"annotations": {"storageclass.beta.kubernetes.io/is-default-class": "true"}}}'
其中local为我的sc名称

或者在创建class时添加注解：
metadata:
annotations:
storageclass.beta.kubernetes.io/is-default-class: "true"

##报错
Error from server (InternalError): Internal error occurred: failed calling webhook \“users.iam.kubes
k8s1.26 安装kubesphere3.4.1 多次安装卸载后，报错
failed: [localhost] (item={'ns': 'kubesphere-system', 'kind': 'users.iam.kubesphere.io', 'resource': 'admin', 'release': 'ks-core'}) => {"ansible_loop_var": "item", "changed": true, "cmd": "/usr/local/bin/kubectl -n kubesphere-system annotate --overwrite users.iam.kubesphere.io admin meta.helm.sh/release-name=ks-core && /usr/local/bin/kubectl -n kubesphere-system annotate --overwrite users.iam.kubesphere.io admin meta.helm.sh/release-namespace=kubesphere-system && /usr/local/bin/kubectl -n kubesphere-system label --overwrite users.iam.kubesphere.io admin app.kubernetes.io/managed-by=Helm\n", "delta": "0:00:00.440257", "end": "2023-12-21 13:46:30.328877", "failed_when_result": true, "item": {"kind": "users.iam.kubesphere.io", "ns": "kubesphere-system", "release": "ks-core", "resource": "admin"}, "msg": "non-zero return code", "rc": 1, "start": "2023-12-21 13:46:29.888620", "stderr": "Error from server (InternalError): Internal error occurred: failed calling webhook \"users.iam.kubesphere.io\": failed to call webhook: Post \"https://ks-controller-manager.kubesphere-system.svc:443/validate-email-iam-kubesphere-io-v1alpha2?timeout=30s\": service \"ks-controller-manager\" not found", "stderr_lines": ["Error from server (InternalError): Internal error occurred: failed calling webhook \"users.iam.kubesphere.io\": failed to call webhook: Post \"https://ks-controller-manager.kubesphere-system.svc:443/validate-email-iam-kubesphere-io-v1alpha2?timeout=30s\": service \"ks-controller-manager\" not found"], "stdout": "", "stdout_lines": []}

####删除提示的crd资源
kubectl get validatingwebhookconfigurations

NAME WEBHOOKS AGE
cluster.kubesphere.io 1 5m17s
network.kubesphere.io 1 5m17s
resourcesquotas.quota.kubesphere.io 1 5m17s
rulegroups.alerting.kubesphere.io 3 5m17s
storageclass-accessor.storage.kubesphere.io 1 5m17s
users.iam.kubesphere.io 1 5m17s

###
kubectl delete validatingwebhookconfigurations `kubectl get validatingwebhookconfigurations|awk '{print $1}'`
###重新执行安装kubesphere3 即可

## ks-apiserver 一直处于 ContainerCreating 状态
kubesphere-system ks-apiserver 0/1 ContainerCreating

#### 删除namespace rook-ceph 提示正在回退中无法删除的解决方法：
先开启 kubectl proxy 代理模式
然后再开一个ssh窗口执行：

kubectl get ns rook-ceph -o json > rook-ceph.yaml

编辑文件rook-ceph.yaml 将三行删除掉如图剪头所示删除之后的内容

finalizers:[
"kubernetes"
]

curl -k -H "Content-Type: application/json" -X PUT --data-binary @rook-ceph.yaml http://127.0.0.1:8001/api/v1/namespaces/rook-ceph/finalize