一 安装docker

参考链接:ubuntu18.04安装docker,以及docker配置修改教程_ubuntu如何更改docker配置-CSDN博客

二 安装k8s

安装k8s的三个主要组件,kubelet,kubeadm和kubectl。

其中kubelet提供k8s的核心服务,kubeadm是k8s的快速安装工具,kubectl是k8s的命令行工具。

执行以下命令:

切换到根目录下运行

sudo -s


curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add - 
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF  
apt-get update
sudo apt-get install -y kubelet=1.19.2-00 kubeadm=1.19.2-00 kubectl=1.19.2-00 kubernetes-cni  #安装特定版本的k8s的相关组件

sudo systemctl enable kubelet && systemctl start kubelet  #设置开机启动
 

三 安装master节点

  • 初始化master节点
  • 部署flannel网络
  • 配置kubectl工具


kubeadm init --image-repository registry.cn-hangzhou.aliyuncs.com/google_containers --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=192.168.56.11

 注意:--apiserver-advertise-address要改为自己的master主机地址。

--pod-network-cidr=10.244.0.0/16 直接填这个就好,他是k8s的节点网络,后续用不到

这一步可能会遇到的问题:

问题一:

[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.

Unfortunately, an error has occurred:
    timed out waiting for the condition

This error is likely caused by:
    - The kubelet is not running
    - The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

If you are on a systemd-powered system, you can try to troubleshoot the error with the following commands:
    - 'systemctl status kubelet'
    - 'journalctl -xeu kubelet'

Additionally, a control plane component may have crashed or exited when started by the container runtime.
To troubleshoot, list all containers using your preferred container runtimes CLI.
Here is one example how you may list all running Kubernetes containers by using crictl:
    - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock ps -a | grep kube | grep -v pause'
    Once you have found the failing container, you can inspect its logs with:
    - 'crictl --runtime-endpoint unix:///var/run/containerd/containerd.sock logs CONTAINERID'
error execution phase wait-control-plane: couldn't initialize a Kubernetes cluster
To see the stack trace of this error execute with --v=5 or higher
 

解决办法:

systemctl status kubelet   #查看kubelet状态,显示以下信息,发现未成功启动
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: activating (auto-restart) (Result: exit-code) since Mon 2024-08-19 11:05:38 CST; 2s ago
     Docs: https://kubernetes.io/docs/home/
  Process: 19254 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS (code=exited, status=1/
 Main PID: 19254 (code=exited, status=1/FAILURE)
 

这可能是由于没有关闭swap分区导致的,执行以下命令

sudo swapoff -a #关闭分区

永久关闭分区则需要vim /etc/fstab注释swap行

sudo systemctl restart kubelet   #重新启动kubelet

systemctl status kubelet       #查看是否启动成功

启动成功如下: 

● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Mon 2024-08-19 14:33:10 CST; 11s ago
     Docs: https://kubernetes.io/docs/home/
 Main PID: 7825 (kubelet)
    Tasks: 12 (limit: 4915)
   CGroup: /system.slice/kubelet.service
           └─7825 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/l

之后在执行初始化命令,问题解决!

问题二: failed to pull image registry.k8s.io/kube-apiserver:v1.26.3

问题分析:拉不到镜像源

解决办法: 

 --image-repository=registry.aliyuncs.com/google_containers  #在初始化的命令中加上这一条参数 

问题三:

[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory “/etc/kubernetes/manifests”. This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.

Unfortunately, an error has occurred:
timed out waiting for the condition

This error is likely caused by:
- The kubelet is not running
- The kubelet is unhealthy due to a misconfiguration of the node in some way (required cgroups disabled)

 问题分析:kubelet运行不正常

解决办法:

#kubelet运行无法启动或者运行不正常有很多原因导致

systemctl status kubelet #查看kubelet运行状态

journalctl -xeu kubelet # 查看kubelet日志分析原因

错误一:Failed to create sandbox for pod :拉取 registry.k8s.io/pause:3.9 镜像失败

解决办法:

### 生成 containerd 的默认配置文件
containerd config default > /etc/containerd/config.toml 
### 查看 sandbox 的默认镜像仓库在文件中的第几行 
cat /etc/containerd/config.toml | grep -n "sandbox_image"  
### 使用 vim 编辑器 定位到 sandbox_image,将 仓库地址修改成 k8simage/pause:3.6
vim /etc/containerd/config.toml  
sandbox_image = "k8simage/pause:3.6"  
### 重启 containerd 服务  
systemctl daemon-reload  
systemctl restart containerd.service 

其余错误,可以执行kubeadm reset重置,然后重新初始化

出现下面输出说明初始化成功了:

Your Kubernetes control-plane has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

Alternatively, if you are the root user, you can run:

  export KUBECONFIG=/etc/kubernetes/admin.conf

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

Then you can join any number of worker nodes by running the following on each as root:

kubeadm join 172.30.25.123:6443 --token 6e1c4k.is7szuoovdrrpiwp \
    --discovery-token-ca-cert-hash sha256:57827b31129f49b8cbd3ec9dd98d8e57b0d7bbc183f042adbeb812dddae1c353 
 

之后按照输出的信息的指示执行: 

mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config 

执行下述命令查看是否成功:

# 查看已加入的节点
kubectl get nodes


# 查看集群状态
kubectl get cs 

kubectl get nodes出错:

Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")

问题分析:执行 kubeadm reset命令后没有删除创建的 .kube目录。

解决办法:

 rm -rf .kube/
sudo mkdir ~/.kube
sudo cp /etc/kubernetes/admin.conf ~/.kube/
 
cd ~/.kube
sudo mv admin.conf config
sudo service kubelet restart

四 node子节点加入master集群 

从第三步初始化成功的输出信息中找到下面命令执行,注意:不同人的命令不一样!

kubeadm join 172.30.25.123:6443 --token 6e1c4k.is7szuoovdrrpiwp \
    --discovery-token-ca-cert-hash sha256:57827b31129f49b8cbd3ec9dd98d8e57b0d7bbc183f042adbeb812dddae1c353 

执行这一步可能会出现下面的错误:

> --discovery-token-ca-cert-hash sha256:ff6e33c41ad0f3785e3d51f197c65dc881fc33415c96225c92677b4ea159ae
[preflight] Running pre-flight checks
error execution phase preflight: [preflight] Some fatal errors occurred:
    [ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
    [ERROR Port-10250]: Port 10250 is in use
    [ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
 

分析:

原因:从报错信息上看,有两个文件已经存在冲突了(注意网上其他人会说直接删除,这里不能删除,会导致kubelet config file加载失败。还有就算10250端口正在使用。

解决办法:

执行下面命令 :

sudo mv /etc/kubernetes/kubelet.conf /etc/kubernetes/kubelet.conf.bak #备份冲突文件

sudo mv /etc/kubernetes/pki/ca.crt /etc/kubernetes/pki/ca.crt.bak  #备份冲突文件

sudo kubeadm init phase kubelet-start

sudo systemctl restart kubelet   #重启kubelet

再次执行 

kubeadm join 172.30.25.123:6443 --token 6e1c4k.is7szuoovdrrpiwp \
    --discovery-token-ca-cert-hash sha256:57827b31129f49b8cbd3ec9dd98d8e57b0d7bbc183f042adbeb812dddae1c353 

加入集群成功后,可以在master结点,执行kubectl get nodes,如果没有看到新加入的Node节点:

解决办法:

在node节点上查,重新修改器主机名,之后重新加入集群

hostname $hostname // 查看主机名
hostnamectl --static set-hostname k8s-node1 //修改主机名为k8s-node1
kubeadm reset //清理环境
kubeadm join 192.168.92.100:6443 --token q13il4.r3zmvd4fc6kaqavi  --discovery-token-ca-cert-hash sha256:d740b336ca71ca6b7c1ff8ea4cc92db7848e6342db9fefbb3a2682b3f9753708
 //重新加入,如果token已过期,在master节点执行kubeadm token create --print-join-command,再复制

至此,k8s安装完成,撒花!!! 

总结:k8s环境的搭建,有些人很顺利,有些人会遇到各种各样的问题,遇到问题不要着急,耐心地查看报错信息,查看运行日志来分析报错原因,逐条解决,k8s的学习任重道远,环境的搭建只是第一步,我们一起加油!!!

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐