使用 vagrant 快速搭建一主两从三台虚拟机集群。



  1. VirtualBox 安装包
  2. Vagrant 安装包
  3. CentOS box文件
  4. Vagrantfile 创建虚拟机文件
  5. centos-init.sh 虚拟机环境初始化脚本
  6. docker-init.sh docker初始化脚本
  7. k8s-init.shk8s初始化脚本
  8. master-images.shmaster服务器docker镜像拉取脚本
  9. kube-flannel.yml网络组件
  10. kubeadm-config.yml k8s初始化配置文件(参考)

二、Vagrant 创建三台CentOS7服务器

请先下载 centos7.0-x86_64.box 虚拟机box文件

  1. 初始化 centos7
    # 添加本地 box(注意路径不能有空格)
    vagrant box add centos7 D:/MallProject/app/VirtualBox-VMs/centos7.0-x86_64.box
    # 查看本地 box
    vagrant box list
    # 先进入指定目录,打开命令窗口
    cd D:\WorkFiles\k8s\vagrant 
    # 初始化本地 centos7 文件(在当前目录下会生成 Vagrantfile 文件)
    vagrant init centos7
  2. 修改生成的 Vagrantfile 文件(覆盖替换,关注ip)
    # -*- mode: ruby -*-
    # vi: set ft=ruby :
    Vagrant.configure("2") do |config|
      config.vm.box_check_update = false
      config.vm.provider 'virtualbox' do |vb|
       vb.customize [ "guestproperty", "set", :id, "/VirtualBox/GuestAdd/VBoxService/--timesync-set-threshold", 1000 ]
      config.vm.synced_folder ".", "/vagrant", type: "rsync"
      $num_instances = 3
      # $etcd_cluster = "k8s-node1="
      (1..$num_instances).each do |i|
        config.vm.define "k8s-node#{i}" do |node|
          node.vm.box = "centos7"
          node.vm.hostname = "k8s-node#{i}"
          ip = "192.168.56.#{i+100}"
          node.vm.network "private_network", ip: ip
          node.vm.provider "virtualbox" do |vb|
            vb.memory = "2048"
            vb.cpus = 2
            vb.name = "k8s-node#{i}"
          # node.vm.provision "shell", path: "install.sh", args: [i, ip, $etcd_cluster]
  3. 启动镜像(等待完成后,可在 VirtualBox 上看到已创建好三台虚拟机 )
    # 注意在Vagrantfile所在目录下执行
    vagrant up
  4. 进入三个虚拟机,开启ssh权限,以便后续通过xshell登录操作
    # cmd窗口登录
    vagrant ssh k8s-node1
    # 修改 root 密码 - 根据提示输入两次新密码
    sudo passwd root
    # 切换到 root 若不修改,默认为 vagrant
    su root
    # 查看当前用户
    vi /etc/ssh/sshd_config
    #修改 PasswordAuthentication no 为 yes
    PasswordAuthentication yes
    # 重启服务
    service sshd restart



  1. 下载并上传 centos-init.sh 到三台服务器
  2. 执行脚本进行初始化环境
    # 设置sh脚本格式
    vi /opt/centos-init.sh
    :set fileformat=unix
    # 赋予可执行权限
    chmod +x /opt/centos-init.sh
    # 执行脚本
    sh centos-init.sh



  1. 下载并上传 docker-init.sh 到三台服务器
  2. 执行脚本进行初始化环境
    # 设置sh脚本格式
    vi /opt/docker-init.sh
    :set fileformat=unix
    # 赋予可执行权限
    chmod +x /opt/docker-init.sh
    # 执行脚本
    sh docker-init.sh



  1. 下载并上传 k8s-init.sh 到三台服务器
  2. 执行脚本进行初始化环境
    # 设置sh脚本格式
    vi /opt/k8s-init.sh
    :set fileformat=unix
    # 赋予可执行权限
    chmod +x /opt/k8s-init.sh
    # 执行脚本
    sh k8s-init.sh

六、k8s master节点初始化


  1. 下载并上传 master-images.shmaster 服务器,即 k8s-node1

  2. 执行脚本拉取依赖的镜像

    # 设置sh脚本格式
    vi /opt/master-images.sh
    :set fileformat=unix
    # 赋予可执行权限
    chmod +x /opt/master-images.sh
    # 执行前可以先查看所需组件的版本
    kubeadm config images list --kubernetes-version v1.26.0
    # 执行脚本
    sh master-images.sh
  3. 初始化k8s

    # 一、指令式初始化
    # 注意,apiserver-advertise-address地址,通过 ip addr 命令,查看 eth0 获得
    # 2.6版本后,image-repository参数不再传递给cri运行时去下载pause镜像
    # 需要手动修改 /etc/containerd/config.toml 文件,见 1.6.1 问题 1
    kubeadm init \
     --apiserver-advertise-address= \
     --control-plane-endpoint=k8s-node1 \
     --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers \
     --kubernetes-version v1.26.0 \
     --service-cidr= \
    # 二、通过配置文件初始化集群,两者选其一(注:需修改配置文件)
    kubeadm config print init-defaults > kubeadm-config.yml 
    # 初始化集群
    kubeadm init --config kubeadm-config.yml
    # 若初始化成功,按照提示执行
    mkdir -p $HOME/.kube
    sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
    sudo chown $(id -u):$(id -g) $HOME/.kube/config
    # 记得复制保存打印的命令:
    # kubeadm join --token np71a6.197zbcz5l6yur1ji --discovery-token-ca-cert-hash sha256:82e0eeadf242ce9fbcb0d3681116d3f845bf8a77deeb0a02e38e57c17112cb35


    # 重置设置 - 若初始化失败,再次初始化之前,先执行此命令重置
    kubeadm reset
    # 查看状态
    systemctl status kubelet
    # 查看错误日志
    journalctl -xeu kubelet
  4. 安装flannel网络组件
    下载并上传 kube-flannel.yml 到master服务器

    kubectl apply -f kube-flannel.yml
    # 卸载(失败重装时可先卸载)
    kubectl delete -f kube-flannel.yml



6.1 问题: containerd问题
[init] Using Kubernetes version: v1.26.0
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-03-13T16:54:24+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher


# 检查配置并备份
mv /etc/containerd/config.toml /etc/containerd/config.toml.back
# 重新生成配置
containerd config default > /etc/containerd/config.toml

vi /etc/containerd/config.toml

# 1.找到 [plugins."io.containerd.grpc.v1.cri".registry.mirrors] 并追加4行(不确定此配置是否必要)
    endpoint = ["https://dockerhub.mirrors.nwafu.edu.cn"]
    endpoint = ["https://registry.aliyuncs.com/k8sxio"]
# 2.找到 [plugins."io.containerd.grpc.v1.cri"] 设置pause镜像地址(必要)
  sandbox_image = "registry.aliyuncs.com/k8sxio/pause:3.6"
# 重启服务
systemctl daemon-reload
systemctl enable containerd
systemctl restart containerd
systemctl status containerd
6.2 问题: cgroup问题
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.



# 修改k8s控制组为 cgroupfs
sudo mkdir -p /etc/systemd/system/kubelet.service.d

sudo tee /etc/systemd/system/kubelet.service.d/10-kubeadm.conf <<EOF
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --cgroup-driver=cgroupfs"

systemctl daemon-reload
systemctl restart kubelet


# 查看docker状态
docker info | grep 'Cgroup'

# 设置docker控制组为 systemd
vi /etc/docker/daemon.json
"exec-opts": ["native.cgroupdriver=systemd"],

systemctl daemon-reload
systemctl restart docker

七、 k8s 从节点加入master节点


  1. 在每个从节点服务器执行
    # 此命令在 master 服务器执行 kubeadm init 初始化成功后会打印
    kubeadm join --token np71a6.197zbcz5l6yur1ji \
     --discovery-token-ca-cert-hash sha256:82e0eeadf242ce9fbcb0d3681116d3f845bf8a77deeb0a02e38e57c17112cb35
  2. 等待几分钟后,在master查看
    kubectl get nodes


7.1 token过期
[preflight] Running pre-flight checks
	[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.2. Latest validated version: 18.09
error execution phase preflight: couldn't validate the identity of the API Server: abort connecting to API servers after timeout of 5m0s


# 在master服务器执行,生成新的token
kubeadm token create --ttl 24h --print-join-command

# 查看生成的 token
kubeadm token list
7.2. service containerd 异常(注:前面已执行过 6.1 解决方法,便不会出现此问题)
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
        [ERROR CRI]: container runtime is not running: output: time="2023-03-15T16:04:50+08:00" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint \"unix:///var/run/containerd/containerd.sock\": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1


# 检查配置并备份
mv /etc/containerd/config.toml /etc/containerd/config.toml.back
# 重新生成配置
containerd config default > /etc/containerd/config.toml
# 重启服务
systemctl daemon-reload
systemctl enable containerd
systemctl restart containerd
systemctl status containerd
7.3. connection refused 连接异常
[preflight] Running pre-flight checks
error execution phase preflight: couldn't validate the identity of the API Server: Get "": dial tcp connect: connection refused


# 1. 重置 kubeadm reset
# 2. 修改 k8s 初始化时指定的IP(eth0= 改为 eht1=
# 若使用配置文件初始化的,要修改 kubeadm-config.yml中 IP
kubeadm init --apiserver-advertise-address= ...
# 3. 修改 kube-flannel.yml 指定 --iface=eth1
7.4. 证书过期,博主未遇到,可参考
7.5. 加入成功,但工作节点命令执行失败
[root@k8s-node2 ~]# kubectl get nodes
E0317 17:10:08.837852    8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.838065    8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.839914    8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.842028    8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
E0317 17:10:08.843524    8750 memcache.go:238] couldn't get current server API group list: Get "http://localhost:8080/api?timeout=32s": dial tcp [::1]:8080: connect: connection refused
The connection to the server localhost:8080 was refused - did you specify the right host or port?


# 1. 复制 master 节点文件到工作节点 /etc/kubernetes/admin.conf
# 2. 设置环境变量
echo "export KUBECONFIG=/etc/kubernetes/admin.conf" >> /etc/profile
# 3. 刷新环境
7.6. 工作节点 NotReady
[root@k8s-node1 opt]# kubectl get nodes
NAME        STATUS     ROLES           AGE   VERSION
k8s-node1   Ready      control-plane   79m   v1.26.0
k8s-node2   NotReady   <none>          63m   v1.26.0
k8s-node3   NotReady   <none>          62m   v1.26.0

[root@k8s-node2 opt]# journalctl -f -u kubelet.service
rpc error: code = Unknown desc = failed to get sandbox image \"registry.k8s.io/pause:3.6\": failed to pull image \"registry.k8s.io/pause:3.6\
Mar 17 17:44:17 k8s-node2 kubelet[8546]: E0317 17:44:17.090922    8546 kubelet.go:2475] "Container runtime network not ready" networkReady="NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized"


# 1. 见 6.1 问题 1,修改工作节点 /etc/containerd/config.toml,指定 pause 拉取地址
# 2. 重启 containerd
# 3. 重启 kubelet
systemctl restart kubelet

[root@k8s-node3 ~]# kubectl get nodes
NAME        STATUS   ROLES           AGE   VERSION
k8s-node1   Ready    control-plane   92m   v1.26.0
k8s-node2   Ready    <none>          75m   v1.26.0
k8s-node3   Ready    <none>          75m   v1.26.0

