kubernetes 最新版本 (containerd 替代 docker) 初始化集群速通
k8s CentOS 7.9 部署集群速通
配环境
前言
部署集群速通
华为云 买 ECS
CentOS 7.9
master 2vCPU + 4G RAM
worker 4vCPU + 8G RAM
master 和 worker 放在同一个 vpc 里头 (VPC - Virtual Private Cloud, 就一个云端局域网)
master internal IP: 192.168.0.121
worker internal IP: 192.168.0.122
PRE_SETUP
关防火墙
systemctl stop firewalld # 关闭服务
systemctl disable firewalld # 禁用服务
firewall-cmd --state # 查看防火墙状态
关 selinux
# 关闭 SELINUX
sudo setenforce 0
# 把 SELINUX=enforcing 改成 SELINUX=permissive
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
# 禁用 SELINUX, cat 出来的应该是 SELINUX=disabled
cat /etc/selinux/config
# 检查状态
sestatus
禁用交换分区
# sudo vi /etc/fstab
# 注释掉 swap 那行
sed -i '/swap/s/^/#/' /etc/fstab
cat /etc/fstab
swapoff -a
INSTALL CONTAINERD
sudo yum -y update
sudo yum install -y yum-utils
# 注意,这里用的是国内的源, 如果在海外没墙, 自行上 docker 官方文档 installation 部分找库
sudo yum-config-manager \
--add-repo \
http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
sudo yum install -y containerd.io
systemctl enable containerd
systemctl start containerd
systemctl status containerd
CONFIG CONTAINERD
containerd config dump > /etc/containerd/config.toml
# 这条命令是在国内拉不到墙外镜像的换源操作, 海外不用弄
sed -i 's|sandbox_image.*$|sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"|' /etc/containerd/config.toml
systemctl restart containerd
systemctl status containerd
containerd config dump | grep -i disabled_plugins
containerd config dump | grep -i sandbox_image
INSTALL K8S
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
# 在海外直接用它原来的库就行, 这条 cat<<EOF 命令在 k8s installation doc 里有
# baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
# gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
# https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF
sudo yum install -y kubelet-1.27.2 kubeadm-1.27.2 kubectl-1.27.2 --disableexcludes=kubernetes
#--disableexcludes=kubernetes 禁掉除了这个之外的别的仓库
# sudo yum remove -y kubelet-1.27.2 kubeadm-1.27.2 kubectl-1.27.2 --disableexcludes=kubernetes
# sudo yum install -y kubelet-1.26.5 kubeadm-1.26.5 kubectl-1.26.5 --disableexcludes=kubernetes
sudo systemctl enable --now kubelet
# 此时应该不 work, 得配网完成之后才行
sleep 5
sudo systemctl status kubelet
配网
# 修改网络配置
cat << EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
# 手动加载所有配置文件
sudo sysctl --system
- 在华为云上还要配一些东西(虚拟机上应该可能也许大概不用)
# in case "file doesn't exist" or "没有那个文件或目录"
modprobe br_netfilter
echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 > /proc/sys/net/ipv4/ip_forward
# then do echo again
cat << EOF | sudo tee /etc/sysctl.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sysctl -p
kubeadm init - master only (worker 上不用搞这部分)
file example
注意, 这里记得改自己的 api server 地址, 就那个 advertise-address, 和 master 内网 IP 一致就行, 我这儿是 192.168.0.121 手动在买 ECS 的时候设置的
kubernetes 的 containerd 和 image repo, api server 地址, cgroupDriver 都是在这儿配置
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.27.2
networking:
podSubnet: "192.168.0.0/16"
apiServer:
extraArgs:
advertise-address: "192.168.0.121"
imageRepository: "registry.aliyuncs.com/google_containers"
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: KubeadmConfig
clusterConfiguration:
criSocket: /run/containerd/containerd.sock
command version
其实直接 vim 到文件里修改也行, 但是我为了复制粘贴能直接用, 搞了个 cat 版本的,省的 vim 进去之后编辑然后再保存出来一通操作了, 直接 command + C / command + V 搞定(没错我本地是 mac 所以不是 control + C / control + V)
export KUBECONFIG=/etc/kubernetes/admin.conf
cat << EOF | sudo tee -a ~/.bashrc
export KUBECONFIG=/etc/kubernetes/admin.conf
EOF
cat << EOF | sudo tee ~/kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.27.2
networking:
podSubnet: "192.168.0.0/16"
apiServer:
extraArgs:
advertise-address: "192.168.0.121"
imageRepository: "registry.aliyuncs.com/google_containers"
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: KubeadmConfig
clusterConfiguration:
criSocket: /run/containerd/containerd.sock
EOF
init
kubeadm init --config kubeadm-config.yaml --v=5
install calico - 先 calico 再 join!
kubeadm init 完之后不能直接 join, 要先装 calico
如果先 join 再装 calico 的话, calico-kube-controller 和 core-dns 会炸
# 配置网络
curl -O https://raw.githubusercontent.com/projectcalico/calico/master/manifests/calico.yaml
scp calico.yaml ecs-master:/root/
kubectl apply -f calico.yaml
watch -n 1 kubectl get pods -A
# calico 的读音['kælɪkəʊ]
two ways:
- official doc
- https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yaml
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/custom-resources.yaml
- github repo
- https://github.com/projectcalico/calico
- 找 manifests 里的 calico.yaml
curl -O https://raw.githubusercontent.com/projectcalico/calico/master/manifests/calico.yaml
scp 到服务器上
kubectl apply -f calico.yaml
join cluster
命令在 init 的时候已经打印过了, 在 worker 跑一遍
join 命令没记住咋整 - 重新 create 一个打印出来(这里 token 多创建几个没事儿, 大不了删了, 不碍事)
kubeadm token create --print-join-command
验证集群正常运行
kubectl get nodes
kubectl get pods --all-namespaces
速通到此结束, 后面的是各种衍生阅读
碰到的错误:
kubeadm 卡住, 没法继续向下搞
一点点通过 systemctl status
, journalctl -xeu
来 debug, 发现是 containerd 在下载 sandbox image 的时候出了问题
直接在 containerd 的设置里把 sandbox image 重新设定一下
DEBUG 相关:
设定文件的位置:
/etc/containerd/config.yaml
/etc/crictl.yaml
kubeadm 启动失败后重来
kubeadm reset
# 正常情况下 reset 一下就可以了, 后面这些是在 reset 之后无效的情况下搞的, 都是一些清理当前文件的命令
rm -f /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/manifests/kube-controller-manager.yaml /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/manifests/etcd.yaml
rm -rf /var/lib/etcd
sudo kill -9 $(lsof -i tcp:6443 | awk 'NR>1 {print $2}') $(lsof -i tcp:10259 | awk 'NR>1 {print $2}') $(lsof -i tcp:10257 | awk 'NR>1 {print $2}') $(lsof -i tcp:10250 | awk 'NR>1 {print $2}') $(lsof -i tcp:2379 | awk 'NR>1 {print $2}') $(lsof -i tcp:2380 | awk 'NR>1 {print $2}')
debug command
systemctl status -l containerd
journalctl -xeu kubelet
cri not implemented
刚才修改 containerd 配置的时候没重启
node 一直 not ready
没装 calico
calico 相关 pod 一直 CrashLoopBackOff
原因: init 完 kubeadm 之后直接在 worker 上 join cluster
应该先装 calico, 等 calico 的几个 pod 都 ready 之后再让 worker join cluster
Workflow
碰到错误如何回退
reset cluster
on master
kubectl drain <master-node-name> --ignore-daemonsets
kubectl drain ecs-worker --ignore-daemonsets
kubectl drain ecs-master --ignore-daemonsets
kubeadm reset
force delete / kill pod
kubectl delete pods <POD_NAME> --grace-period=0 --force -n <NAMESPACE>
Settings
crictl
/etc/crictl.yaml
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 2
debug: true
pull-image-on-create: true
containerd
/etc/containerd/config.toml
version = 2
# disabled_plugins 必须清空或者删掉
disabled_plugins = []
[plugins]
[plugins."io.containerd.grpc.v1.cri"]
sandbox_image = "registry.k8s.io/pause:3.9"
sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"
# 后面这几行 mirror 跟 k8s 没关系, 可以不加
[plugins."io.containerd.grpc.v1.cri".registry]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors]
[plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"]
endpoint = ["https://registry.aliyuncs.com/google_containers"]
containerd 镜像源配置 - 我没跑通
https://github.com/containerd/containerd/blob/main/docs/hosts.md
Related Knowledge
pause container
和当前 pod 的生命周期相同
是 pod 第一个启动的 container
是 pod 最后一个停的 container
hold namespaces for current pod
PID namespace, network namespace, IPC (Inter Process Communication) namespace
1 pod - 多个 namespace (但每种 namespace 只有一个)
pause container 管理当前 pod 的所有进程
sandbox image 就是用来运行 pause container 的 image
runc
runc 是一个用来根据 OCI (Open Container Initiative) 生成和运行容器的 CLI tool
runc 是一个 lightweight, protable 的 container runtime
container runtime
示例: docker, containerd, CRI-O
启停、分发镜像之类的活儿
层次关系
容器编排工具 container orchestration tools
kubernetes / docker swarm / apache mesos
container orchestration tools - CRI shim (like docker shim, containerd’s cri plugin) - high level container runtime (docker, containerd, etc) - OCI compliant (low level) container runtime (runc)
CNI - container network interface
todo 看 cni plugin calico
CRI - container runtime interface (protocol)
和 kubernetes 兼容
sandbox image - pause container’s image
cgroup
control group
cgroup driver
可以想象成 cgroup manager
docker 是怎么和 containerd 交互的
通过 docker daemon 直接交互
Command
sed - Stream EDitor
https://www.ibm.com/docs/en/aix/7.2?topic=s-sed-command
sed [option] [script] [file location]
option:
-i: 保存修改
script:
script 包含几部分, pattern, sub command, parameter, flag
# 删除1-3行
sed -i '1,3d' test.txt
# 在包含 "swap" 这个 pattern 行的开头加上#
sed -i '/swap/s/^/# /' /etc/fstab
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config
# `^` 表示行首
# `$` 表示行末
sed script 中的顺序是 pattern, command,
sed -i '/pattern/s/foo/bar/g' filename
# 在这个例子中, s 是 command, g 是 flag
# s 表示 "替换" 命令
# g 表示 "global" flag
# flag 可以修改 command 的行为
pattern 是选择行的,
选择分隔符 / delimiter
echo 'hello, world' | sed '\?s?wor?WOR?'
# 命令(s)后的第一个字符是分隔符
# 如果有 pattern, 假如 pattern 是 wor, 那选取分隔符应该是这个格式的:
echo 'hello, world' | sed '\?ld?s?wor?WOR?'
echo 'hello, world, world' | sed '\?s?wor?WOR?'
echo 'hello, world, world' | sed '\?s?wor?WOR?g'
echo 'hello, world' | sed '\?s?wor?WOR?'
echo 'hello, world' | sed '\?s?wor?WOR?'
echo 'hello, world' | sed '\?s?wor?WOR?'
更多推荐
所有评论(0)