K8S 1.28.2版本搭建(containerd运行时)
k8s 1.28.2版本部署, containerd作为运行时
K8S 1.28.2版本搭建
一、节点规划
主机名 | ip地址 | 角色 |
---|---|---|
k8s-master | 192.168.1.1 | control-plane/data-plan/nfs |
k8s-node1 | 192.168.1.2 | data-plan |
k8s-node2 | 192.168.1.3 | data-plan |
基于上述的节点规划,每个节点都需进行如下操作
1. 清空防火墙策略
Kubernets 的端口映射和转发都通过 iptables 服务实现。所以需要在安装 Kubernetes 之前,清空所有的 iptables 策略。以保证不会由于原有遗留策略,导致无法正常安装 Kubernetes
2. 关闭 swap
临时关闭:swapoff -a
永久关闭:sed -i '/^[^#]/ s/\(\s\+swap\s\)/#&/' /etc/fstab
检查 swap 状态:
free -g
# 输出结果
# swap 行的 total 为 0 时,表示已关闭所有的 swap
total used free shared buff/cache available
Mem: 15 8 0 0 6 6
Swap: 0 0 0
3. 加载内核模块
此操作主要目的是:将 kube-proyx 的代理模式从 iptables 切换至 ipvs。若无此需求,可跳过此操作
3.1. 调整配置文件
cat > /etc/modules-load.d/modules.conf << EOF
br_netfilter
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
EOF
3.2. 安装模块
> for line in $(cat /etc/modules-load.d/modules.conf); \
> do modprobe $line; \
> done
4. 调整内核参数
4.1. 创建配置文件
- net.ipv4.ip_forward:启用 IPv4 数据包的转发 vm.swapiness:设置内核尽量不使用 swap
- net.bridge.bridge-nf-call-iptables:启用桥接网络时,调用 iptables 进行过滤
- net.bridge.bridge-nf-call-ip6tables:启用桥接网络时,调用 ip6tables进行过滤
cat > /etc/sysctl.d/k8s.conf << EOF
# Kubernetes
net.ipv4.ip_forward = 1
vm.swapiness = 0
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
EOF
4.2. 生效配置
sysctl -p /etc/sysctl.d/k8s.conf
5. 配置时间同步
5.1. 安装 Chrony 服务
yum install chrony -y
5.2. 配置 Chrony 服务
cat > /etc/chrony.conf < EOF
5.3. 重启并将 Chrony 服务设置为开机自启
systemctl enable chrony --now
5.4. 检查 Chrony 服务状态
chronyc sources -v
# 输出结果
# 只有当某一行 Chrony Server 状态为 ^* 时,才表示能够基于此 Chrony Server 完成时间同步
210 Number of sources = 1
.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| / '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
|| .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted offset,
|| Log2(Polling interval) --. | | yyyy = measured offset,
|| \ | | zzzz = estimated error.
|| | | \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* redis1 4 6 377 18 +1539us[+1549us] +/- 63ms
- 创建服务目录
mkdir -p /opt/kubernetes
安装 Containerd 服务
由于 Docker 已将 Containerd 作为独立服务贡献给社区,并且 Kubernetes 只需使用 Containerd
完成容器管理。所以若节点上已安装 Docker,可跳过此步骤;若未安装,则只需安装 Container 即可
- 配置 Containerd 的 yum 源
yum-config-manager --add-repo https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
sed -i 's+download.docker.com+mirrors.aliyun.com/docker-ce+' /etc/yum.repos.d/docker-ce.repo
- 更新 yum 源
yum clean all && yum makecache fast
- 安装 Container 服务
yum install containerd.io -y
- 生成原始 Containerd 配置文件
mkdir -p /etc/containerd
containerd config default > /etc/containerd/config.toml
- 备份 Containerd 配置文件
cp /etc/containerd/config.toml /etc/containerd/config.toml.orig
- 修改 Containerd 配置文件
# 将镜像仓库地址调整为阿里云地址,以保证能够拉取到镜像
# 第 62 行,修改为如下配置
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
# 使用 systemd 作为容器的 cgroup driver,以保证节点资源紧张时更加稳定
# 第 126 行,修改为如下配置
SystemdCgroup = true
7. 重启并将 Container 服务设置为开机自启
systemctl enable containerd --now
8. 验证 Container 服务是否正常运行
ctr version
# 输出结果
# 有输出,则表示服务已正常运行
Client:
Version: 1.6.25
Revision: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
Go version: go1.20.10
Server:
Version: 1.6.25
Revision: d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f
UUID: 0218e80c-c56f-4b65-98bb-48a44120978c
安装 Kubernetes 工具
- 配置 Kubernetes 的 yum 源
cat > /etc/yum.repos.d/kubernetes.repo << EOF
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
- 更新 yum 源
yum clean all && yum makecache fast
- 安装 Kubernetes 工具
yum install kubelet kubeadm kubectl -y
部署 Kubernetes Master
基于上述的服务架构的规划,Master 需部署在 k8s-master 节点上
1. 生成原始 Kubernetes 配置文件
kubeadm config print init-defaults > /data/service/kubernetes/kubeadm.yaml
2. 备份 Kubernetes 配置文件
cp /data/service/kubernetes/kubeadm.yaml /data/service/kubernetes/kubeadm.yaml.orig
3. 修改 Kubernetes 配置文件
# 此 IP 为 Master 节点的 IP 地址
# 第 12 行,修改为如下配置
advertiseAddress: 192.168.1.1
# 设置 containerd 的连接套接字
# 第 15 行,修改为如下配置
criSocket: unix:///var/run/containerd/containerd.sock
# 此名称为 Master 节点的主机名
# 第 17 行,修改为如下配置
name: k8s-master
# 将镜像仓库地址调整为阿里云地址,以保证能够拉取到镜像
# 第 30 行,修改为如下配置
imageRepository: registry.aliyuncs.com/google_containers
# 指定 pod 的 IP 网段
# 在第 35 行后,添加如下配置
podSubnet: 10.244.0.0/16
配置文件样例
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
advertiseAddress: 10.10.10.22
bindPort: 6443
nodeRegistration:
criSocket: unix:///var/run/containerd/containerd.sock
imagePullPolicy: IfNotPresent
name: redis2
taints: null
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.28.0
networking:
dnsDomain: cluster.local
serviceSubnet: 10.96.0.0/12
podSubnet: 10.244.0.0/16
scheduler: {}
4. 拉取所需镜像
kubeadm config images pull --config /data/service/kubernetes/kubeadm.yaml
5. 安装 Kubernetes Master
kubeadm init --config /data/service/kubernetes/kubeadm.yaml
# 输出结果
Your Kubernetes control-plane has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Alternatively, if you are the root user, you can run:
export KUBECONFIG=/etc/kubernetes/admin.conf
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join 10.10.10.22:6443 --token abcdef.0123456789abcdef \
--discovery-token-ca-cert-hash sha256:cad3fa778559b724dff47bb1ad427bd39d97dd76e934b9467507a2eb990a50c7
6. 配置集群
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
部署 Kubernetes 网络
Kubernetes Master 部署完成后,查看集群状态会发现如下异常:
通过 kubectl get nodes 查看节点状态,会发现 master 一直处于 notready 状态 通过 kubectl get
pods -n kube-system 查看 Pod 状态,会发现 coredns 也一直处于 notready 状态
上述两个异常,主要由于未安装网络插件导致。而 Kubernetes 支持多种网络插件,此文档所选用的是最常见的网络插件:Calico基于上述的节点规划,需在 k8s-master 节点上进行如下操作
# 1. 安装网络插件 Calico
# 获取 yaml 文件
curl https://breezey-public.oss-cn-zhangjiakou.aliyuncs.com/cka/calico.yaml -o /data/service/kubernetes/calico.yaml
# 部署网络插件 Calico
kubectl apply -f /data/service/kubernetes/calico.yaml
# 2. 检查 Pod 状态
kubectl get pods -n kube-system -o wide
# 输出结果
# 发现 coredns 状态更新为 Running
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-78496c69f6-bcqf9 1/1 Running 0 74d 10.244.214.65 redis2 <none> <none>
calico-node-452ps 1/1 Running 0 74d 10.10.10.132 hadoop2 <none> <none>
calico-node-4gqff 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
calico-node-lv4sg 1/1 Running 0 74d 10.10.10.23 redis3 <none> <none>
calico-node-wc2hw 1/1 Running 0 74d 10.10.10.133 hadoop3 <none> <none>
coredns-66f779496c-bb72k 1/1 Running 0 74d 10.244.214.66 redis2 <none> <none>
coredns-66f779496c-pqfw7 1/1 Running 0 74d 10.244.214.67 redis2 <none> <none>
etcd-redis2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
kube-apiserver-redis2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
kube-controller-manager-redis2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
kube-proxy-8882t 1/1 Running 0 74d 10.10.10.132 hadoop2 <none> <none>
kube-proxy-8v5vq 1/1 Running 0 74d 10.10.10.23 redis3 <none> <none>
kube-proxy-f4wf2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
kube-proxy-vst9n 1/1 Running 0 74d 10.10.10.133 hadoop3 <none> <none>
kube-scheduler-redis2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
3. 检查 Node 状态
kubectl get nodes
# 输出结果
# 发现 Master 状态更新为 Ready
NAME STATUS ROLES AGE VERSION
redis2 Ready control-plane 11d v1.28.2
添加 Kubernetes Worker
1. 生成 token
token 是 Master 与 Worker 之间认证的唯一凭证
每个 token 的有效时间为 24 小时。超时后,token 将无法再被使用,需要通过如下命令再次生成
基于上述的节点规划,需在 k8s-master 节点上进行如下操作
kubeadm token create --print-join-command
# 输出结果
kubeadm join 10.10.10.22:6443 --token w9g299.f9dymq7iza6h97s1 --discovery-token-ca-cert-hash sha256:5ec5a1e20cb9282f763c8aadb640e32f5a6e542df2ab7383125bd3334ab97521
2. 添加 Kubernetes Worker
基于上述的节点规划,需在 redis3、hadoop2 和 hadoop3 节点上运行
kubeadm join 10.10.10.22:6443 --token w9g299.f9dymq7iza6h97s1 --discovery-token-ca-cert-hash sha256:5ec5a1e20cb9282f763c8aadb640e32f5a6e542df2ab7383125bd3334ab97521
3. 查看添加结果
kubectl get nodes
# 输出结果
# 所添加 Worker 状态已更新至 Ready
NAME STATUS ROLES AGE VERSION
hadoop2 Ready <none> 17d v1.28.2
hadoop3 Ready <none> 17d v1.28.2
hazelcast1 NotReady <none> 17d v1.28.2
redis2 Ready control-plane 17d v1.28.2
redis3 Ready <none> 17d v1.28.2
kubectl get pods -n kube-system -o wide
# 输出结果
# 所添加 Worker 上已部署 calico-node 和 kube-proxy 容器
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
calico-kube-controllers-78496c69f6-bcqf9 1/1 Running 0 74d 10.244.214.65 redis2 <none> <none>
calico-node-452ps 1/1 Running 0 74d 10.10.10.132 hadoop2 <none> <none>
calico-node-4gqff 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
calico-node-lv4sg 1/1 Running 0 74d 10.10.10.23 redis3 <none> <none>
calico-node-wc2hw 1/1 Running 0 74d 10.10.10.133 hadoop3 <none> <none>
coredns-66f779496c-bb72k 1/1 Running 0 74d 10.244.214.66 redis2 <none> <none>
coredns-66f779496c-pqfw7 1/1 Running 0 74d 10.244.214.67 redis2 <none> <none>
etcd-redis2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
kube-apiserver-redis2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
kube-controller-manager-redis2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
kube-proxy-8882t 1/1 Running 0 74d 10.10.10.132 hadoop2 <none> <none>
kube-proxy-8v5vq 1/1 Running 0 74d 10.10.10.23 redis3 <none> <none>
kube-proxy-f4wf2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
kube-proxy-vst9n 1/1 Running 0 74d 10.10.10.133 hadoop3 <none> <none>
kube-scheduler-redis2 1/1 Running 0 74d 10.10.10.22 redis2 <none> <none>
- 给予 Kubernetes Worker 管理集群能力
默认情况下,只有 Kubernetes Master 能够管理 Kubernetes 集群。若在 Kubernetes Worker
节点上管理集群,将会出现如下报错:
E0123 09:42:49.858315 6523 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0123 09:42:49.859436 6523 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0123 09:42:49.860516 6523 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0123 09:42:49.861618 6523 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0123 09:42:49.862442 6523 memcache.go:265] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?
基于上述报错信息,可以发现默认连接的 Kubernetes 集群为 localhost:8080,而非 Kubernetes Master
所在节点,所以导致无法正常连接 Kubernetes 集群。所以若希望 Kubernetes Worker 也具有管理 Kubernetes
集群的能力,则需要在 Kubernetes Worker 节点上存放 Kubernetes 集群的配置文件,以使其能够连接
Kubernetes Master。具体步骤如下:在 Kubernetes Worker 节点上创建配置目录
# 此目录为 Kubernetes 内部指定路径,必须创建此路径用于存放配置文件
mkdir -p /root/.kube
将 Kubernetes Master 的配置文件存放至 Kubernetes Worker 的配置目录中
scp /root/.kube/config root@redis3:/root/.kube/
Kubernetes Worker 节点即时能够管理 Kubernetes 集群
kubectl get namespaces
# 输出结果
NAME STATUS AGE
default Active 46d
ingress-nginx Active 10d
kube-node-lease Active 46d
kube-public Active 46d
kube-system Active 46d
test Active 34d
基于上述内容,可以发现任何节点只需要状态 kubectl 工具,与 Kubernetes Master 节点网络互通,且存有 Kubernetes 集群配置文件(/root/.kube/config),就能够进行 Kubernetes 集群管理。所以在实际生产场景中,能够将某个特定节点作为集群管理节点,并且能够完成多集群管理。但在实际使用过程中,并不会直接将管理员权限提供给使用者,所以还需要与权限管理配合使用。具体内容请见权限管理相关文档
部署 Ingress 控制器 Ingress-Nginx
在实际生成场景中,一般使用 services(Kubernetes 自带功能) 实现集群内部代理,使用 ingresses 实现集群外部代理。若无需集群外部代理功能,则可跳过此步骤
基于上述的节点规划,需在 k8s-master节点上进行如下操作
# 获取 yaml 文件
wget https://breezey-public.oss-cn-zhangjiakou.aliyuncs.com/cka/ingress-nginx-v1.9.3.yaml -O /data/service/kubernetes/ingress-nginx.yaml
# 调整配置文件
# 使用宿主机 IP 地址 Pod 的 IP 地址
# 在 139 行后,添加如下配置
hostNetwork: true
# 部署控制器 Ingress-Nginx
kubectl apply -f /data/service/kubernetes/ingress-nginx.yaml
检查 Ingress-Nginx 状态
查看 Namespace 状态
kubectl get ns | grep ingress-nginx
# 输出结果
# 将会自动创建 Namespace:ingress-nginx
ingress-nginx Active 108s
查看 Pod 状态
kubectl get pods -n ingress-nginx -o wide | grep "ingress-nginx-controller"
# 输出结果
# 容器所使用的 IP 地址为宿主机 IP 地址
ingress-nginx-controller-769b6777ff-nnkf4 1/1 Running 0 4m40s 10.10.10.23 redis3 <none> <none>
注意事项
1、init时卡住,报错如下:
[root@k8s-master docker]# kubeadm init --apiserver-advertise-address=192.168.202.6 --control-plane-endpoint=k8s-master --image-repository registry.cn-hangzhou.aliyuncs.com/lfy_k8s_images --kubernetes-version v1.28.2 --service-cidr=10.96.0.0/16 --pod-network-cidr=172.16.0.0/16
[init] Using Kubernetes version: v1.28.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Using existing ca certificate authority
[certs] Using existing apiserver certificate and key on disk
[certs] Using existing apiserver-kubelet-client certificate and key on disk
[certs] Using existing front-proxy-ca certificate authority
[certs] Using existing front-proxy-client certificate and key on disk
[certs] Using existing etcd/ca certificate authority
[certs] Using existing etcd/server certificate and key on disk
[certs] Using existing etcd/peer certificate and key on disk
[certs] Using existing etcd/healthcheck-client certificate and key on disk
[certs] Using existing apiserver-etcd-client certificate and key on disk
[certs] Using the existing "sa" key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
W1110 12:49:47.077127 73798 kubeconfig.go:246] a kubeconfig file "/etc/kubernetes/admin.conf" exists already but has an unexpected API Server URL: expected: https://k8s-master:6443, got: https://cluster-endpoint:6443
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
W1110 12:49:47.391437 73798 kubeconfig.go:246] a kubeconfig file "/etc/kubernetes/kubelet.conf" exists already but has an unexpected API Server URL: expected: https://k8s-master:6443, got: https://cluster-endpoint:6443
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/kubelet.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/controller-manager.conf"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/scheduler.conf"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Starting the kubelet
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
Unfortunately, an error has occurred:
timed out waiting for the condition
This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)
If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
- 'systemctl status kubelet'
- 'journalctl -xeu kubelet'
Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all Kubernetes containers running in docker:
- 'docker ps -a | grep kube | grep -v pause'
Once you have found the failing container, you can inspect its logs with:
- 'docker logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
解决方案(检查containerd配置是否正确):
crictl config runtime-endpoint /run/containerd/containerd.sock
2、k8s tls: failed to verify certificate 错误:
检查证书是否到期(如到期进行续期处理):kubeadm certs check-expiration
证书未到期:
rm -rf $HOME/.kube
mkdir -p $HOME/.kube
cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
chown $(id -u):$(id -g) $HOME/.kube/config
更多推荐
所有评论(0)