k8s-1.28.2内网环境集群部署

Master1022

1841人浏览 · 2023-12-07 15:10:02

Master1022 · 2023-12-07 15:10:02 发布

文章目录

1.环境准备
2 配置containerd
3.安装kubeadm、kubelet、kubectl
- 3.1安装
- 3.2 修改配置
4.初始化k8s
5.配置node节点
6.配置k8s网络
- 6.1下载calico-3.26.4
- 6.2 部署
7.相关问题解决方案

1.环境准备

机器系统：ubuntu20.04.3

k8s版本：1.28.2

containerd版本：1.6.25

三台虚拟机：

master：192.168.100.55

node1: 192.168.100.66

node2: 192.168.100.77

1.1 虚拟机初始化设置

step1 配置ip

下载的ubuntu可能没有ifconfig命令，所以使用ip命令配置虚拟机ip。（前提：虚拟机可以和主机连通）

ip addr add 192.168.100.55/20 dev <网卡名称>  # 配置网卡
ip route add default via 192.168.100.1 dev <网卡名称> # 配置网关

step2 配置apt镜像源

首先，在hosts中加入DNS地址vim /etc/hosts添加如下内容

<内网镜像源ip> aliyun.com # 这里以aliyun为例

其次，修改镜像源。如果是内网修改这个文件下面的源地址/etc/apt/sources.list，如果是外网

https://blog.csdn.net/xiangxianghehe/article/details/122856771

配置完后apt-get update

step3 配置k8s镜像源

sudo echo "deb <镜像源地址> main" > /etc/apt/sources.list.d/kubernetes.list
apt-get update

如果update失败需要配置keyserver-ubuntu再重新update.

1.2 关闭防火墙

root@master:~# systemctl disable ufw
Synchronizing state of ufw.service with SysV service script with /lib/systemd/systemd-sysv-install.
Executing: /lib/systemd/systemd-sysv-install disable ufw

1.3 配置时间同步

root@master:~# apt install -y ntpdate
root@master:~# ntpdate  time1.aliyun.com(内网地址)

1.4 禁止swap分区

root@master:~# swapoff -a
root@master:~# vim /etc/fstab  #这一步是注释掉swap，但是我这台虚拟机直接没有swap的挂载信息
root@master:~# free -m
              total        used        free      shared  buff/cache   available
Mem:           3913        1161         263           3        2488        2385
Swap:             0           0           0

对于为什么禁用swap分区：至于swap，在计算集群（请注意计算集群这四个字的含义，这种集群主要运行一些生存周期短暂的计算应用，申请大量内存-动用大量CPU-完成计算-输出结果-退出，而不是运行诸如mysql之类的服务型程序）中，我们通常希望OOM的时候直接杀掉进程，向运维或者作业提交者报错提示，并且执行故障转移，把进程在其他节点上重启起来。而不是用swap续命，导致节点hang住，集群性能大幅下降，并且运维还得不到报错提示。更可怕的是有一些集群的swap位于机械硬盘阵列上，大量动用swap基本可以等同于死机，你甚至连root都登录不上，不用提杀掉问题进程了。往往结局就是硬重启。

k8s配置中关闭swap是必要的，否则会报错。

1.5 ubuntu系统配置修改

modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack
modprobe nf_conntrack_ipv4
modprobe br_netfilter
modprobe overlay
 
cat > /etc/modules-load.d/k8s-modules.conf <<EOF
ip_vs
ip_vs_rr
ip_vs_wrr
ip_vs_sh
nf_conntrack
nf_conntrack_ipv4
br_netfilter
overlay
EOF

cat <<EOF > /etc/sysctl.d/kubernetes.conf
# 开启数据包转发功能（实现vxlan）
net.ipv4.ip_forward=1
# iptables对bridge的数据进行处理
net.bridge.bridge-nf-call-iptables=1
net.bridge.bridge-nf-call-ip6tables=1
net.bridge.bridge-nf-call-arptables=1
# 关闭tcp_tw_recycle，否则和NAT冲突，会导致服务不通
net.ipv4.tcp_tw_recycle=0
# 不允许将TIME-WAIT sockets重新用于新的TCP连接
net.ipv4.tcp_tw_reuse=0
# socket监听(listen)的backlog上限
net.core.somaxconn=32768
# 最大跟踪连接数，默认 nf_conntrack_buckets * 4
net.netfilter.nf_conntrack_max=1000000
# 禁止使用 swap 空间，只有当系统 OOM 时才允许使用它
vm.swappiness=0
# 计算当前的内存映射文件数。
vm.max_map_count=655360
# 内核可分配的最大文件数
fs.file-max=6553600
# 持久连接
net.ipv4.tcp_keepalive_time=600
net.ipv4.tcp_keepalive_intvl=30
net.ipv4.tcp_keepalive_probes=10
EOF
sysctl -p /etc/sysctl.d/kubernetes.conf
 
ufw disable

2 配置containerd

2.1 先验环境安装

step1 安装必要的系统工具

sudo apt-get update

sudo apt-get -y install apt-transport-https ca-certificates curl software-properties-common

step2 安装GPG证书

mkdir -p /etc/apt/keyrings
curl -fsSL https://mirrors.aliyun.com/docker-ce/linux/ubuntu/gpg | sudo apt-key add -

step3 写入软件源信息

sudo add-apt-repository "deb [arch=amd64] https://mirrors.aliyun.com/docker-ce/linux/ubuntu $(lsb_release -cs) stable"

step4 更新并安装containerd

sudo apt-get -y update
sudo apt-get -y install containerd.io

step5 查看containerd版本

root@master:/home# containerd --version
containerd containerd.io 1.6.25 d8f198a4ed8892c764191ef7b3b06d8a2eeb5c7f

step6 生成containerd配置

root@master:~# containerd config default | sudo tee /etc/containerd/config.toml  #如果报错可以先mkdir /etc/containerd目录再执行

2.2 配置containerd修改

vim  /etc/containerd/config.toml

sandbox_image = "k8s.gcr.io/pause:3.9" #将sandbox_image后面的谷歌仓库改为阿里云仓库地址
systemd_cgroup = true #将原来默认的false改为true，如果不修改可能会报warning提示cgroup控制器有问题（需要和kubelet的控制器保持一致）
runtime_type = "io.containerd.runtime.v1.linux" #如果不修改这个，后面可能无法正常拉取镜像

# 重新加载配置文件
daemon-reload
# 重启containerd服务
systemctl enable --now containerd && systemctl restart containerd

查看containerd状态

root@master:/home# systemctl status containerd
● containerd.service - containerd container runtime
     Loaded: loaded (/lib/systemd/system/containerd.service; enabled; vendor preset: enabled)
     Active: active (running) since Wed 2023-12-06 02:19:46 EST; 19h ago
       Docs: https://containerd.io
   Main PID: 32499 (containerd)
      Tasks: 125
     Memory: 121.9M
        CPU: 14min 9.409s
     CGroup: /system.slice/containerd.service
             ├─32499 /usr/bin/containerd
             ├─34173 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 3af88ea90654ada11aa917024580843784cb95aefd633404358575c22ed4d518 -address /run/>
             ├─34205 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 0ab295c65faa9fcc66b1c0bea85950b26fceb941c347819b9a19f56ca15f0cad -address /run/>
             ├─34237 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id afb371525a1f9555d595394cfa2bffde592c65a57725b602e06ce8fc15b0c826 -address /run/>
             ├─34263 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 24adcfdfd2a194fabca3552cff2232c7618cab3e9c603e50ffd386b245ea4713 -address /run/>
             ├─34568 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id b69a1dc421ff046aca6c96f6dff15ecd74f4f52859916f7689c72a34334815ea -address /run/>
             ├─38459 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 5b30d61ed93385cbb45ddcc6ffd07717daaff503e84d7de873f5999906972e78 -address /run/>
             ├─38509 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 6579c8e87a41c99f41b5e9012ee58f172497a658c6dd68aef0f270b9d6022302 -address /run/>
             └─46491 /usr/bin/containerd-shim-runc-v2 -namespace k8s.io -id 999461451999fd3061e46eae3e279b1f5e631779312bfc27b70ba4aae17c1cef -address /run/>

Dec 06 04:48:57 master containerd[32499]: time="2023-12-06T04:48:57.563811417-05:00" level=info msg="shim disconnected" id=d000434f48430019b2f60cd0a9b9fd74>
Dec 06 04:48:57 master containerd[32499]: time="2023-12-06T04:48:57.563877892-05:00" level=warning msg="cleaning up after shim disconnected" id=d000434f484>
Dec 06 04:48:57 master containerd[32499]: time="2023-12-06T04:48:57.563889339-05:00" level=info msg="cleaning up dead shim"
Dec 06 04:48:57 master containerd[32499]: time="2023-12-06T04:48:57.575365944-05:00" level=warning msg="cleanup warnings time=\"2023-12-06T04:48:57-05:00\">
Dec 06 04:48:58 master containerd[32499]: time="2023-12-06T04:48:58.359166719-05:00" level=info msg="RemoveContainer for \"3081eb56e570bdaaa812baa43601906e>
Dec 06 04:48:59 master containerd[32499]: time="2023-12-06T04:48:59.259638215-05:00" level=info msg="RemoveContainer for \"3081eb56e570bdaaa812baa43601906e>
Dec 06 04:49:15 master containerd[32499]: time="2023-12-06T04:49:15.467269020-05:00" level=info msg="CreateContainer within sandbox \"afb371525a1f9555d5953>
Dec 06 04:49:18 master containerd[32499]: time="2023-12-06T04:49:18.356434471-05:00" level=info msg="CreateContainer within sandbox \"afb371525a1f9555d5953>
Dec 06 04:49:18 master containerd[32499]: time="2023-12-06T04:49:18.357125524-05:00" level=info msg="StartContainer for \"6668a22a2f3b799c2869c29d50cb08b2d>
Dec 06 04:49:18 master containerd[32499]: time="2023-12-06T04:49:18.848875292-05:00" level=info msg="StartContainer for \"6668a22a2f3b799c2869c29d50cb08b2d>

查看containerd版本

root@master:/home# ctr --version
ctr containerd.io 1.6.25

上面配置后，containerd安装成功。

3.安装kubeadm、kubelet、kubectl

3.1安装

上面已经配置完k8s的镜像源现在只需要安装即可

apt-get install -y kubelet kubeadm kubectl

# 将软件包标记为保留，以防止软件包被自动安装、升级或删除
apt-mark hold kubelet kubeadm kubectl

3.2 修改配置

k8s安装好之后默认的配置是/run/docker/docker.sock,因为在k8s1.24之后弃用了docker。所以需要修改成containerd，否则在执行k8s的init时会保存。修改步骤如下

root@master:~# vim /etc/crictl.yaml
runtime-endpoint: "unix:///run/containerd/containerd.sock"
image-endpoint: "unix:///run/containerd/containerd.sock"
timeout: 10 #超时时间不宜过短，我这里修改成10秒了
debug: false
pull-image-on-create: false
disable-pull-on-run: false

root@master:~# systemctl daemon-reload && systemctl restart containerd
root@master:~# crictl images
IMAGE                                                             TAG                 IMAGE ID            SIZE

4.初始化k8s

4.1 生成k8s配置文件

kubeadm config print init-defaults --component-configs KubeletConfiguration > kubeadm_init.yaml

配置文件如下

apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
  - system:bootstrappers:kubeadm:default-node-token
  token: abcdef.0123456789abcdef
  ttl: 24h0m0s
  usages:
  - signing
  - authentication
kind: InitConfiguration
localAPIEndpoint:
  advertiseAddress: 192.168.100.55
  bindPort: 6443
nodeRegistration:
  criSocket: unix:///var/run/containerd/containerd.sock
  imagePullPolicy: IfNotPresent
  name: master
  taints: null
---
apiServer:
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.28.2
networking:
  dnsDomain: cluster.local
  serviceSubnet: 10.96.0.0/12
scheduler: {}
---
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 0s
    enabled: true
  x509:
    clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 0s
    cacheUnauthorizedTTL: 0s
cgroupDriver: systemd
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
containerRuntimeEndpoint: ""
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
logging:
  flushFrequency: 0
  options:
    json:
      infoBufferSize: "0"
  verbosity: 0
memorySwap: {}
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
resolvConf: /run/systemd/resolve/resolv.conf
rotateCertificates: true
runtimeRequestTimeout: 0s
shutdownGracePeriod: 0s
shutdownGracePeriodCriticalPods: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s

需要修改的地方

字段	默认值	修改后的值
cgroupDriver	systemd	containerd的配置已经改成了使用systemd，所以这里保持默认配置
kubernetesVersion	1.28.0	1.28.2
imageRepository	k8s.gcr.io	registry.aliyuncs.com/google_containers
advertiseAddress	1.2.3.4
nodeRegistration	node	master的hostname名字

4.2 提前拉取镜像

kubeadm config images pull --config k8s-init-master.yaml

4.3 给pause镜像打tag

ctr -n k8s.io i tag registry.aliyuncs.com/google_containers/pause:3.9 k8s.gcr.io/pause:3.9

重启containerd

systemctl restart containerd

4.3 正式初始化k8s

root@master:/home# kubeadm init --config kubeadm_init.yaml
[init] Using Kubernetes version: v1.28.2
······
[bootstrap-token] Creating the "cluster-info" ConfigMap in the "kube-public" namespace
[kubelet-finalize] Updating "/etc/kubernetes/kubelet.conf" to point to a rotatable kubelet client certificate and key
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 192.168.100.55:6443 --token abcdef.0123456789abcdef \
        --discovery-token-ca-cert-hash sha256:014dca58bef3df1ba0a8e75dac1ea6598487f28eec691782c5a78b8c117519b2

根据上面的提示执行

mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
export KUBECONFIG=/etc/kubernetes/admin.conf

5.配置node节点

可以将上述的虚拟机镜像clone一份作为node。node主机中需要使用k8s加入master节点。执行如下命令

kubeadm join 192.168.100.55:6443 --token abcdef.0123456789abcdef \
        --discovery-token-ca-cert-hash sha256:014dca58bef3df1ba0a8e75dac1ea6598487f28eec691782c5a78b8c117519b2

root@master:/home/node1# kubeadm join 192.168.100.55:6443 --token abcdef.0123456789abcdef         --discovery-token-ca-cert-hash sha256:014dca58bef3df1ba0a8e75dac1ea6598487f28eec691782c5a78b8c117519b2
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the control-plane to see this node join the cluster.

至此，k8s基础架构已经安装成功，后面依然需要网络镜像等。

6.配置k8s网络

此时查看node是否启动

root@master:/home# kubectl get pods -n kube-system
NAME                             READY   STATUS    RESTARTS   AGE
coredns-86966648-5jc4x           0/1     Pending   0          61s
coredns-86966648-96sqz           0/1     Pending   0          61s
etcd-master                      1/1     Running   1          73s
kube-apiserver-master            1/1     Running   1          75s
kube-controller-manager-master   1/1     Running   1          75s
kube-proxy-9nvdc                 1/1     Running   0          61s
kube-scheduler-master            1/1     Running   1          70s
root@master:/home# kubectl get nodes
NAME     STATUS     ROLES           AGE     VERSION
master   NotReady   control-plane   2m29s   v1.28.2
node1    NotReady   <none>          4s      v1.28.2

可以看到master和node都显示NotReady。这是因为k8s没有配置网络的原因。下面开始配置calico网络。

6.1下载calico-3.26.4

下载地址：https://github.com/projectcalico/calico/releases

在这里插入图片描述

下载后传到服务器上进行解压。将image镜像pull到container中。

ctr -n k8s.io images import calico-cni.tar
ctr -n k8s.io images import calico-kube-controllers.tar
ctr -n k8s.io images import calico-node.tar

注意：上述镜像在node中也要安装

同时需要修改release-v3.26.4/manifests中的calico.yaml文件。在4800行左右，打开下面两行的注释，并修改10.96.0.0/12为init中设置的pod地址。

# chosen from this range. Changing this value after installation will have
# no effect. This should fall within `--cluster-cidr`.
# - name: CALICO_IPV4POOL_CIDR
# value: "10.96.0.0/12"

6.2 部署

kubectl apply -f calico.yaml

执行完可以获得以下信息

root@master:/home# kubectl get pods --all-namespaces
NAMESPACE     NAME                                       READY   STATUS    RESTARTS      AGE
kube-system   calico-kube-controllers-7c968b5878-8sbkl   1/1     Running   2 (17h ago)   18h
kube-system   calico-node-68m72                          1/1     Running   0             18h
kube-system   calico-node-vn95j                          1/1     Running   0             18h
kube-system   coredns-86966648-5jc4x                     1/1     Running   0             20h
kube-system   coredns-86966648-96sqz                     1/1     Running   0             20h
kube-system   etcd-master                                1/1     Running   1             20h
kube-system   kube-apiserver-master                      1/1     Running   2 (17h ago)   20h
kube-system   kube-controller-manager-master             1/1     Running   3 (17h ago)   20h
kube-system   kube-proxy-9nvdc                           1/1     Running   0             20h
kube-system   kube-proxy-9xjz6                           1/1     Running   0             20h
kube-system   kube-scheduler-master                      1/1     Running   2 (18h ago)   20h

root@master:/home# kubectl get nodes
NAME     STATUS   ROLES           AGE   VERSION
master   Ready    control-plane   20h   v1.28.2
node1    Ready    <none>          20h   v1.28.2

至此，k8s基本架构和网络都已配置后。后面需要部署业务，部署业务推荐使用gitlab。后续业务部署需要根据不同业务来实现，较完整地应用部署说明：https://cloud.tencent.com/developer/article/1821616

7.相关问题解决方案

7.1 问题1：

crictl imagesWARN[0000] image connect using default endpoints: [unix:///var/run/dockershim.sock unix:///run/containerd/containerd.sock unix:///run/crio/crio.sock unix:///var/run/cri-dockerd.sock]. As the default settings are now deprecated, you should set the endpoint instead.E0722 23:05:31.059137 34283 remote_image.go:119] “ListImages with filter from image service failed” err=“rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory”” filter="&ImageFilter{Image:&ImageSpec{Image:,Annotations:map[string]string{},},}"FATA[0000] listing images: rpc error: code = Unavailable desc = connection error: desc = “transport: Error while dialing dial unix /var/run/dockershim.sock: connect: no such file or directory”

解决方案：

上述这个问题产生的原因是：/var/run/dockershim.sock文件根本就没有。因为k8s默认依然是docker.sock所以，需要配置成containerd.sock，是因为下面这一步没有配置成unix:///run/containerd/containerd.sock，配置成后报错解决

root@master:~# vim /etc/crictl.yaml
runtime-endpoint: "unix:///run/containerd/containerd.sock"
image-endpoint: "unix:///run/containerd/containerd.sock"
timeout: 10 #超时时间不宜过短，我这里修改成10秒了
debug: false
pull-image-on-create: false
disable-pull-on-run: false

7.2 问题2：

“Error getting node” err=“node “master” not found”

解决：主机的hostname和nodeRegistration中的node不一致，修改成一致即可。如果一致也不可以可能是k8s和containerd版本不兼容。

7.3 问题3：

kubelet-check] The HTTP call equal to ‘curl -sSL http://localhost:10248/healthz’ failed with error: Get “http://localhost:10248/healthz”: dial tcp 127.0.0.1:10248: connect: connection refused.

解决：可能是swap没有关闭成功，执行 swapoff -a。

上述几个问题解决思路可能并不都适用，如果有疑问可以联系博主。