配环境
前言

部署集群速通

华为云 买 ECS

CentOS 7.9
master 2vCPU + 4G RAM
worker 4vCPU + 8G RAM

master 和 worker 放在同一个 vpc 里头 (VPC - Virtual Private Cloud, 就一个云端局域网)
master internal IP: 192.168.0.121
worker internal IP: 192.168.0.122

PRE_SETUP
关防火墙
systemctl stop firewalld    # 关闭服务
systemctl disable firewalld    # 禁用服务
firewall-cmd --state # 查看防火墙状态
关 selinux
# 关闭 SELINUX
sudo setenforce 0
# 把 SELINUX=enforcing 改成 SELINUX=permissive
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

# 禁用 SELINUX, cat 出来的应该是 SELINUX=disabled
cat /etc/selinux/config

# 检查状态
sestatus
禁用交换分区
# sudo vi /etc/fstab
# 注释掉 swap 那行
sed -i '/swap/s/^/#/' /etc/fstab
cat /etc/fstab
swapoff -a
INSTALL CONTAINERD
sudo yum -y update
sudo yum install -y yum-utils

# 注意,这里用的是国内的源, 如果在海外没墙, 自行上 docker 官方文档 installation 部分找库
sudo yum-config-manager \
--add-repo \
http://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo

sudo yum install -y containerd.io

systemctl enable containerd
systemctl start containerd
systemctl status containerd
CONFIG CONTAINERD
containerd config dump > /etc/containerd/config.toml
# 这条命令是在国内拉不到墙外镜像的换源操作, 海外不用弄
sed -i 's|sandbox_image.*$|sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"|' /etc/containerd/config.toml

systemctl restart containerd

systemctl status containerd

containerd config dump | grep -i disabled_plugins
containerd config dump | grep -i sandbox_image

INSTALL K8S
cat <<EOF | sudo tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
# 在海外直接用它原来的库就行, 这条 cat<<EOF 命令在 k8s installation doc 里有
# baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
baseurl=http://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=1
repo_gpgcheck=1
# gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg 
#        https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
gpgkey=http://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg
       http://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
exclude=kubelet kubeadm kubectl
EOF


sudo yum install -y kubelet-1.27.2 kubeadm-1.27.2 kubectl-1.27.2 --disableexcludes=kubernetes
#--disableexcludes=kubernetes  禁掉除了这个之外的别的仓库

# sudo yum remove -y kubelet-1.27.2 kubeadm-1.27.2 kubectl-1.27.2 --disableexcludes=kubernetes

# sudo yum install -y kubelet-1.26.5 kubeadm-1.26.5 kubectl-1.26.5 --disableexcludes=kubernetes

sudo systemctl enable --now kubelet

# 此时应该不 work, 得配网完成之后才行
sleep 5
sudo systemctl status kubelet

配网
# 修改网络配置
cat << EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF

# 手动加载所有配置文件
sudo sysctl --system
  • 在华为云上还要配一些东西(虚拟机上应该可能也许大概不用)
# in case "file doesn't exist" or "没有那个文件或目录"
modprobe br_netfilter

echo 1 > /proc/sys/net/bridge/bridge-nf-call-iptables
echo 1 > /proc/sys/net/ipv4/ip_forward


# then do echo again

cat << EOF | sudo tee /etc/sysctl.conf
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF

sysctl -p
kubeadm init - master only (worker 上不用搞这部分)
file example

注意, 这里记得改自己的 api server 地址, 就那个 advertise-address, 和 master 内网 IP 一致就行, 我这儿是 192.168.0.121 手动在买 ECS 的时候设置的
kubernetes 的 containerd 和 image repo, api server 地址, cgroupDriver 都是在这儿配置

apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.27.2
networking:
  podSubnet: "192.168.0.0/16"
apiServer:
  extraArgs:
    advertise-address: "192.168.0.121"
imageRepository: "registry.aliyuncs.com/google_containers"
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: KubeadmConfig
clusterConfiguration:
  criSocket: /run/containerd/containerd.sock
command version

其实直接 vim 到文件里修改也行, 但是我为了复制粘贴能直接用, 搞了个 cat 版本的,省的 vim 进去之后编辑然后再保存出来一通操作了, 直接 command + C / command + V 搞定(没错我本地是 mac 所以不是 control + C / control + V)

export KUBECONFIG=/etc/kubernetes/admin.conf
cat << EOF | sudo tee -a ~/.bashrc
export KUBECONFIG=/etc/kubernetes/admin.conf
EOF


cat << EOF | sudo tee ~/kubeadm-config.yaml
apiVersion: kubeadm.k8s.io/v1beta3
kind: ClusterConfiguration
kubernetesVersion: v1.27.2
networking:
  podSubnet: "192.168.0.0/16"
apiServer:
  extraArgs:
    advertise-address: "192.168.0.121"
imageRepository: "registry.aliyuncs.com/google_containers"
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
cgroupDriver: systemd
---
apiVersion: kubeadm.k8s.io/v1beta3
kind: KubeadmConfig
clusterConfiguration:
  criSocket: /run/containerd/containerd.sock
EOF

init
kubeadm init --config kubeadm-config.yaml --v=5
install calico - 先 calico 再 join!

kubeadm init 完之后不能直接 join, 要先装 calico

如果先 join 再装 calico 的话, calico-kube-controller 和 core-dns 会炸

# 配置网络
curl -O https://raw.githubusercontent.com/projectcalico/calico/master/manifests/calico.yaml

scp calico.yaml ecs-master:/root/

kubectl apply -f calico.yaml

watch -n 1 kubectl get pods -A

# calico 的读音['kælɪkəʊ]
two ways:
  • official doc
    • https://docs.tigera.io/calico/latest/getting-started/kubernetes/quickstart
kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/tigera-operator.yaml

kubectl create -f https://raw.githubusercontent.com/projectcalico/calico/v3.26.1/manifests/custom-resources.yaml

  • github repo
    • https://github.com/projectcalico/calico
    • 找 manifests 里的 calico.yaml
curl -O https://raw.githubusercontent.com/projectcalico/calico/master/manifests/calico.yaml
scp 到服务器上
kubectl apply -f calico.yaml
join cluster

命令在 init 的时候已经打印过了, 在 worker 跑一遍

join 命令没记住咋整 - 重新 create 一个打印出来(这里 token 多创建几个没事儿, 大不了删了, 不碍事)

kubeadm token create --print-join-command
验证集群正常运行
kubectl get nodes
kubectl get pods --all-namespaces
速通到此结束, 后面的是各种衍生阅读
碰到的错误:

kubeadm 卡住, 没法继续向下搞
一点点通过 systemctl status, journalctl -xeu 来 debug, 发现是 containerd 在下载 sandbox image 的时候出了问题
直接在 containerd 的设置里把 sandbox image 重新设定一下

DEBUG 相关:
设定文件的位置:
/etc/containerd/config.yaml
/etc/crictl.yaml

kubeadm 启动失败后重来

kubeadm reset

# 正常情况下 reset 一下就可以了, 后面这些是在 reset 之后无效的情况下搞的, 都是一些清理当前文件的命令
rm -f /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/manifests/kube-controller-manager.yaml /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/manifests/etcd.yaml

rm -rf /var/lib/etcd

sudo kill -9 $(lsof -i tcp:6443 | awk 'NR>1 {print $2}') $(lsof -i tcp:10259 | awk 'NR>1 {print $2}') $(lsof -i tcp:10257 | awk 'NR>1 {print $2}') $(lsof -i tcp:10250 | awk 'NR>1 {print $2}') $(lsof -i tcp:2379 | awk 'NR>1 {print $2}') $(lsof -i tcp:2380 | awk 'NR>1 {print $2}')

debug command

systemctl status -l containerd
journalctl -xeu kubelet

cri not implemented

刚才修改 containerd 配置的时候没重启

node 一直 not ready

没装 calico

calico 相关 pod 一直 CrashLoopBackOff

原因: init 完 kubeadm 之后直接在 worker 上 join cluster

应该先装 calico, 等 calico 的几个 pod 都 ready 之后再让 worker join cluster

Workflow
碰到错误如何回退
reset cluster

on master

kubectl drain <master-node-name> --ignore-daemonsets

kubectl drain ecs-worker --ignore-daemonsets
kubectl drain ecs-master --ignore-daemonsets

kubeadm reset
force delete / kill pod
kubectl delete pods <POD_NAME> --grace-period=0 --force -n <NAMESPACE>
Settings
crictl

/etc/crictl.yaml

runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 2
debug: true
pull-image-on-create: true
containerd

/etc/containerd/config.toml

version = 2

# disabled_plugins 必须清空或者删掉
disabled_plugins = []

[plugins]
  [plugins."io.containerd.grpc.v1.cri"]
    sandbox_image = "registry.k8s.io/pause:3.9"
    sandbox_image = "registry.aliyuncs.com/google_containers/pause:3.9"

# 后面这几行 mirror 跟 k8s 没关系, 可以不加
  [plugins."io.containerd.grpc.v1.cri".registry]
    [plugins."io.containerd.grpc.v1.cri".registry.mirrors]
      [plugins."io.containerd.grpc.v1.cri".registry.mirrors."registry.k8s.io"]
        endpoint = ["https://registry.aliyuncs.com/google_containers"]

containerd 镜像源配置 - 我没跑通

https://github.com/containerd/containerd/blob/main/docs/hosts.md

Related Knowledge
pause container

和当前 pod 的生命周期相同

是 pod 第一个启动的 container

是 pod 最后一个停的 container

hold namespaces for current pod

PID namespace, network namespace, IPC (Inter Process Communication) namespace

1 pod - 多个 namespace (但每种 namespace 只有一个)

pause container 管理当前 pod 的所有进程

sandbox image 就是用来运行 pause container 的 image

runc

runc 是一个用来根据 OCI (Open Container Initiative) 生成和运行容器的 CLI tool

runc 是一个 lightweight, protable 的 container runtime

container runtime

示例: docker, containerd, CRI-O

启停、分发镜像之类的活儿

层次关系

容器编排工具 container orchestration tools

kubernetes / docker swarm / apache mesos

container orchestration tools - CRI shim (like docker shim, containerd’s cri plugin) - high level container runtime (docker, containerd, etc) - OCI compliant (low level) container runtime (runc)

CNI - container network interface

todo 看 cni plugin calico

CRI - container runtime interface (protocol)

和 kubernetes 兼容

sandbox image - pause container’s image
cgroup

control group

cgroup driver

可以想象成 cgroup manager

docker 是怎么和 containerd 交互的

通过 docker daemon 直接交互

Command
sed - Stream EDitor

https://www.ibm.com/docs/en/aix/7.2?topic=s-sed-command

sed [option] [script] [file location]

option:

-i: 保存修改

script:

script 包含几部分, pattern, sub command, parameter, flag

# 删除1-3行
sed -i '1,3d' test.txt
# 在包含 "swap" 这个 pattern 行的开头加上# 
sed -i '/swap/s/^/# /' /etc/fstab
sudo sed -i 's/^SELINUX=enforcing$/SELINUX=permissive/' /etc/selinux/config

# `^` 表示行首
# `$` 表示行末
sed script 中的顺序是 pattern, command, 

sed -i '/pattern/s/foo/bar/g' filename
# 在这个例子中, s 是 command, g 是 flag
# s 表示 "替换" 命令
# g 表示 "global" flag
#  flag 可以修改 command 的行为

pattern 是选择行的, 

选择分隔符 / delimiter
echo 'hello, world' | sed '\?s?wor?WOR?'
# 命令(s)后的第一个字符是分隔符
# 如果有 pattern, 假如 pattern 是 wor, 那选取分隔符应该是这个格式的:
echo 'hello, world' | sed '\?ld?s?wor?WOR?'

echo 'hello, world, world' | sed '\?s?wor?WOR?'
echo 'hello, world, world' | sed '\?s?wor?WOR?g'
echo 'hello, world' | sed '\?s?wor?WOR?'
echo 'hello, world' | sed '\?s?wor?WOR?'
echo 'hello, world' | sed '\?s?wor?WOR?'

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐