9b54d2149b1ff22a15853cc3f218778a.png

问题背景

rke大大简化了k8s集群的部署,但是也带来了一个问题:稍有差池就会一脸懵逼,rke的文档还是偏少,此外rke安装过程中的日志信息也是少的可怜,导致Google都不知道从何说起

关于rke(强烈推荐,用它来部署k8s): Rancher文档

本文中的所有脚本都已经放到GitHub:CVPaul/rke-k8s-deploy(脚本中有操作提示,运行之前可以先cat出来看看)

环境准备

部署环境非常重要,非常重要,非常重要,环境准备好好了一切都OK(五分钟部署一个集群不是梦),环境没准备好的话你都不知道去哪debug~~~

  • Step 0:官网必须认真读:Rancher文档 (非常重要)
  • Step 0.5: 确保docker正确安装:CZMan95:【环境搭建】Docker简明安装教程
  • Step 1:配置免密登录:ssh-copy-id username@ip.address.of.nodes
  • Step 2:为每个node设置hostname:hotsnamectl set-hostname your.node.name
  • Step 3:检查必要的模块
##########################################################################
# File Name: module-check+install.sh
# Created Time: Wed 16 Sep 2020 02:20:32 PM CST
#########################################################################
#!/bin/zsh
for module in br_netfilter ip6_udp_tunnel ip_set ip_set_hash_ip ip_set_hash_net iptable_filter iptable_nat iptable_mangle iptable_raw nf_conntrack_netlink nf_conntrack nf_conntrack_ipv4   nf_defrag_ipv4 nf_nat nf_nat_ipv4 nf_nat_masquerade_ipv4 nfnetlink udp_tunnel veth vxlan x_tables xt_addrtype xt_conntrack xt_comment xt_mark xt_multiport xt_nat xt_recent xt_set  xt_statistic xt_tcpudp; do
    if ! lsmod | grep -q $module; then
        echo "module $module is not present, try to install...";
                modprobe $module
                if [ $? -eq 0 ]; then
                        echo -e "033[32;1mSuccessfully installed $module!033[0m"
                else
                        echo -e "033[31;1mInstall $module failed!!!033[0m"
                fi
        fi;
done

然后,在Ubuntu 20.04下得到如下的结果

5781d44ed78c09a49481363d68bc5ac3.png

肺都气炸,以nf_conntrack_ip4在4.19版本以后就换名字了~~~~~~~~

不过不要紧,安装完成发现没有也是OK的(我曾一度考虑安装Ubuntu 18.04,这里要感谢, @Gemfield 的文章,因为我发现既然用kubeadm可以安装的话,rke应该也不是问题,所以开整。。。)

  • Step 4:关闭Swap
swapoff -a # 临时关闭,close all swap devices
# 修改/etc/fstab,注释掉swap那行,持久化生效
# sudo vim /etc/fstab
  • Step 5:端口开放设置(Ubuntu防火墙默认是没有的,所以不用特殊处理)
##########################################################################
# File Name: firewall-port-manager.sh
# Author: xianqiu_li
# mail: xianqiu_li@163.com
# Created Time: Thu 17 Sep 2020 10:41:54 AM CST
#########################################################################
#!/bin/zsh

# Open TCP/6443 for all
# iptables -A INPUT -p tcp --dport 6443 -j ACCEPT

# Open TCP/$port for all
# firewall-cmd --zone=public --add-port=$port/tcp --permanent
# firewall-cmd --reload

# Open TCP/6443 for one specific IP
# 这条命令不能通过远端执行(需要sudo权限),说以需要去具体的机器上run
if [ $# -lt 2 ]; then
        echo "Usage: $0 <host> <port>"
        exit 1
fi
host=$1
port=$2
# ssh arthur@192.168.1.110 iptables -A INPUT -p tcp -s 192.168.1.197 --dport 2379 -j ACCEPT
iptables -A INPUT -p tcp -s $host --dport $port -j ACCEPT

## Open TCP/port for one specific IP
#firewall-cmd --permanent --zone=public --add-rich-rule='
#  rule family="ipv4"
#  source address="$host/32"
#  port protocol="tcp" port="$port" accept'
#firewall-cmd --reload
  • Step 6:网桥设置
##########################################################################
# File Name: net.bridge.fix.sh
# Author: xianqiu_li
# mail: xianqiu_li@163.com
# Created Time: Fri 18 Sep 2020 03:04:10 PM CST
#########################################################################
#!/bin/zsh
echo "fix the net.bridge.bridge-nf-call-iptables=1 with fllowing lines"
echo "cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
EOF
sudo sysctl --system"
  • Step 7:Ubuntu20.04会自动suspend(待机/休眠)需要关闭
##########################################################################
# File Name: susppend-mask.sh
# Author: xianqiu_li
# mail: xianqiu_li@163.com
# Created Time: Fri 18 Sep 2020 02:52:48 PM CST
#########################################################################
#!/bin/zsh
sudo systemctl mask sleep.target suspend.target hibernate.target hybrid-sleep.target
  • Step 8:清理环境(非常重要)
    • 由于历史原因已经安装过程中的多次尝试,但是安装环境不纯净,会产生各种各样的问题(比如下面的etcd证书问题),让人一头雾水,下面提供两个脚本清理环境

7561a77de9625769c5ebfd3a392e04d5.png
    • 放点文本,方便检索
WARN[0296] [etcd] host [192.168.1.110] failed to check etcd health: failed to get /health for host [192.168.1.110]: Get https://192.168.1.110:2379/health: net/http: TLS handshake timeout
WARN[0343] [etcd] host [192.168.1.197] failed to check etcd health: failed to get /health for host [192.168.1.197]: Get https://192.168.1.197:2379/health: net/http: TLS handshake timeout
FATA[0343] [etcd] Failed to bring up Etcd Plane: etcd cluster is unhealthy: hosts [192.168.1.110,192.168.1.197] failed to report healthy. Check etcd container logs on each host for more information


2020-09-21 04:29:26.085053 I | embed: rejected connection from "192.168.1.197:56164" (error "remote error: tls: bad certificate", ServerName "")
2020-09-21 04:29:26.085560 I | embed: rejected connection from "192.168.1.197:56166" (error "remote error: tls: bad certificate", ServerName "")
2020-09-21 04:29:26.185396 I | embed: rejected connection from "192.168.1.197:56168" (error "remote error: tls: bad certificate", ServerName "")
2020-09-21 04:29:26.186002 I | embed: rejected connection from "192.168.1.197:56170" (error "remote error: tls: bad certificate", ServerName "")
2020-09-21 04:29:26.285123 I | embed: rejected connection from "192.168.1.197:56174" (error "remote error: tls: bad certificate", ServerName "")
    • 清理rke的安装(需要在每个节点执行)
##########################################################################
# File Name: clear-node.sh
# Author: xianqiu_li
# mail: xianqiu_li@163.com
# Created Time: Mon 21 Sep 2020 12:49:41 PM CST
#########################################################################
#!/bin/zsh
systemctl  disable kubelet.service
systemctl  disable kube-scheduler.service
systemctl  disable kube-proxy.service
systemctl  disable kube-controller-manager.service
systemctl  disable kube-apiserver.service

systemctl  stop kubelet.service
systemctl  stop kube-scheduler.service
systemctl  stop kube-proxy.service
systemctl  stop kube-controller-manager.service
systemctl  stop kube-apiserver.service

# 删除所有容器
docker rm -f $(docker ps -qa)

# 删除所有容器卷
docker volume rm $(docker volume ls -q)

# 卸载mount目录
for mount in $(mount | grep tmpfs | grep '/var/lib/kubelet' | awk '{ print $3 }') /var/lib/kubelet /var/lib/rancher; do umount $mount; done

# 备份目录
mv /etc/kubernetes /etc/kubernetes-bak-$(date +"%Y%m%d%H%M")
mv /var/lib/etcd /var/lib/etcd-bak-$(date +"%Y%m%d%H%M")
mv /var/lib/rancher /var/lib/rancher-bak-$(date +"%Y%m%d%H%M")
mv /opt/rke /opt/rke-bak-$(date +"%Y%m%d%H%M")

# 删除残留路径
rm -rf /etc/ceph 
     /etc/cni 
     /opt/cni 
     /run/secrets/kubernetes.io 
     /run/calico 
     /run/flannel 
     /var/lib/calico 
     /var/lib/cni 
     /var/lib/kubelet 
     /var/log/containers 
     /var/log/pods 
     /var/run/calico

# 清理网络接口
network_interface=`ls /sys/class/net`
for net_inter in $network_interface;
do
  if ! echo $net_inter | grep -qiE 'lo|docker0|eth*|ens*';then
    ip link delete $net_inter
  fi
done

# 清理残留进程
port_list='80 443 6443 2376 2379 2380 8472 9099 10250 10254'
for port in $port_list
do
  pid=`netstat -atlnup|grep $port |awk '{print $7}'|awk -F '/' '{print $1}'|grep -v -|sort -rnk2|uniq`
  if [[ -n $pid ]];then
    kill -9 $pid
  fi
done

pro_pid=`ps -ef |grep -v grep |grep kube|awk '{print $2}'`
if [[ -n $pro_pid ]];then
  kill -9 $pro_pid
fi

# 清理Iptables表
## 注意:如果节点Iptables有特殊配置,以下命令请谨慎操作
sudo iptables --flush
sudo iptables --flush --table nat
sudo iptables --flush --table filter
sudo iptables --table nat --delete-chain
sudo iptables --table filter --delete-chain

systemctl restart docker
    • 清理kubeadm的安装(需要在每个节点执行)
##########################################################################
# File Name: uninstall-cluster.sh
# Author: xianqiu_li
# mail: xianqiu_li@163.com
# Created Time: Fri 18 Sep 2020 05:34:43 PM CST
#########################################################################
#!/bin/zsh
kubeadm reset
sudo apt-get purge kubeadm kubectl kubelet kubernetes-cni kube*
sudo apt-get autoremove
sudo rm -rf ~/.kube
  • Step 9:生成集群配置文件
    • 下载rke的可执行文件: Releases · rancher/rke
      • 国内由于各种原因可能会很慢,这里推荐一个网站:下载 直接起飞
    • 输入命令:./rke config 根据提示进行操作
      • 这里需要注意的是internal_address,对于部分云服务器而言公网IP和内网地址是不一样的,需要设置,对于自己本地局域网的机器的话两个设置成一样的也ok
      • 关于网络可以看这里,K8s CNI网络最强对比:Flannel、Calico、Canal和W_容器,根据自己的需求进行选择
      • 其它的默认就好
    • 这一步会在当前文件夹下生产cluster.yml
  • Step 10:启动集群
./rke up

df2f0b62928ad3c578fc2988d1f2b9e6.png
  • Step 11: 安装kubectl并使用
    • 创建:~/.kube文件夹
    • 将rke生成的kube_config_cluster.yml拷贝到~/.kube,并重命名为config
#  kubectl --kubeconfig ~/.kube/kube_config_cluster.yml get nodes # 临时使用
mkdir ~/.kube && mv kube_config_cluster.yml ~/.kube/config # 做完这两步就可以kubectl get nodes
    • 如果其它机器要用kubectl的话也要上面的操作
    • 安装脚本如下(由于kubeadm和kubelet没有用到,故注释掉)
##########################################################################
# File Name: install-kubectl.sh
# Author: xianqiu_li
# mail: xianqiu_li@163.com
# Created Time: Tue 22 Sep 2020 11:10:43 AM CST
#########################################################################
#!/bin/zsh
echo "虽然Ubuntu默认的安装方式snap,但是安装不成功,所以还是推荐apt"
apt-get update
apt-get install -y ca-certificates curl software-properties-common apt-transport-https curl
curl -s https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | apt-key add -

tee /etc/apt/sources.list.d/kubernetes.list <<EOF
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF

apt-get update
apt-get install -y kubectl # kubelet kubeadm
apt-mark hold kubectl # kubelet kubeadm

4c3ee47a52ea6c46ff0738d13dac4e9f.png

其它

  • cluster.yml,kube_config_cluster.yml文件一定要好好保存,后续集群维护都要用到
    • kube_config_cluster.yml这个文件有kubectlhelm的凭据。
    • rancher-cluster.yml:RKE集群配置文件。
    • kube_config_rancher-cluster.yml:集群的 Kube config文件,此文件包含完全访问集群的凭据。
    • cluster.rkestate:Kubernetes集群状态文件,此文件包含完全访问集群的凭据。
  • 上面的所有脚本都整理发到了github:CVPaul/rke-k8s-deploy
    • 脚本中有操作提示,运行之前可以先cat出来看看
Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐