ubuntu18.0.4下kubernetes高可用集群搭建
ubuntu18.0.4环境下搭建k8s高可用集群
一、集群规划
主机 | ip | 角色 |
---|---|---|
ceph01 | 192.168.1.12 | master,etcd1 |
k8s-master2 | 192.168.1.125 | master,etcd4 |
k8s-m2 | 192.168.1.14 | master,etcd2 |
ceph02 | 192.168.1.18 | node,etcd6 |
ceph03 | 192.168.1.27 | node,etcd7 |
k8s-m1 | 192.168.1.16 | node,etcd5 |
k8s-m3 | 192.168.1.19 | node,etcd3 |
vip | 192.168.1.200 | 虚拟VIP |
本次安装使用三个master节点、四个node节点;使用外部7节点etcd集群;在master节点上安装keeplived集群,实现master的高可用部署。
由于需要在多台服务器上操作,且部分操作内容相同。所以推荐使ansible命令,根据角色设置不同分组,可有效提高效率。
二、环境准备
1. docker环境安装(集群内所有节点)
1). ubuntu环境下docker 一键安装
#1.安装docker
curl -sSL https://get.daocloud.io/docker | sh
2). ubuntu环境下docker-compose安装
#1.下载docker-compose
$ sudo curl -L https://get.daocloud.io/docker/compose/releases/download/1.22.0/docker-compose-`uname -s`-`uname -m` -o /usr/local/bin/docker-compose
#2.授权
$ sudo chmod +x /usr/local/bin/docker-compose
#3.查看版本信息
$ docker-compose --version
2.环境准备
1). 通用配置(集群内所有节点)
#1.禁用swap,临时关闭,重启失效
swapoff -a
#永久关闭,这个需要重启生效
sudo sed -i 's#\/swap.img#\#\/swap.img#g' /etc/fstab
#2.关闭防火墙
sudo ufw disable
#3.设置br_netfilter。注:cat <<EOF 命令可直接复制到ubuntu服务器执行,以EOF结尾,中间输入内容直接写入指定文件中
cat <<EOF | sudo tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
#4. 设置网络
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.ip_forward = 1
vm.swappiness=0
EOF
#5. 重新加载
sudo sysctl --system
#6.#将docker的cgroup修改为systemd
sudo mkdir /etc/docker
cat <<EOF | sudo tee /etc/docker/daemon.json
{
"registry-mirrors": ["https://y0qd3iq.mirror.aliyuncs.com"],
"exec-opts": ["native.cgroupdriver=systemd"],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file":"3"
}
}
EOF
#重新加载配置文件
sudo systemctl daemon-reload
#设置docker开机自启
sudo systemctl enable docker
#重启docker
sudo systemctl restart docker
#7.添加 k8s source list
# 添加证书
curl https://mirrors.aliyun.com/kubernetes/apt/doc/apt-key.gpg | sudo apt-key add -
# 添加apt源
cat <<EOF | sudo tee /etc/apt/sources.list.d/kubernetes.list
deb https://mirrors.aliyun.com/kubernetes/apt/ kubernetes-xenial main
EOF
# 8.更新apt-get
sudo apt-get update
sudo apt-get upgrade -y
sudo apt-get install -y apt-transport-https ca-certificates curl
#9. 在/etc/hosts加入以下域名解析
185.199.108.133 raw.githubusercontent.com
2)master节点配置(所有master节点)
#1. 下载k8s相关的镜像,下载前需要执行kubeadm config images list,获取镜像列表,然后填写对应的版本号,如下的版本号可以进行修改**,本次安装选择的版本是1.23.6。
vim kubeadm-install.sh
# 脚本内容如下
#!/bin/bash
images=(
kube-apiserver:v1.23.6
kube-controller-manager:v1.23.6
kube-scheduler:v1.23.6
kube-proxy:v1.23.6
pause:3.6
etcd:3.5.1-0
coredns:v1.8.6
)
for imageName in ${images[@]} ; do
docker pull registry.cn-hangzhou.aliyuncs.com/google_containers/${imageName}
if [ $(echo $imageName | awk -F [":"] '{print $1}') != "coredns" ]
then
#echo "----------0-----------"$imageName
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/${imageName} k8s.gcr.io/${imageName}
else
#echo "-----------1-----------" $imageName
docker tag registry.cn-hangzhou.aliyuncs.com/google_containers/${imageName} k8s.gcr.io/coredns/${imageName}
fi
docker rmi registry.cn-hangzhou.aliyuncs.com/google_containers/${imageName}
done
#执行脚本安装
bash kubeadm-install.sh
三、keepalived集群搭建(三台master节点)
1.下载依赖包
sudo apt-get install -y libssl-dev openssl libpopt-dev
sudo apt-get install -y keepalived
#这里为了避免出现IPVS (cmd 1159, errno 2): No such file or directory 错误,下载所有libnl、popt的依赖
sudo apt install libnl* popt* -y
2. 编辑配置文件
cat <<EOF | sudo tee /etc/keepalived/keepalived.conf
global_defs {
#运行keepalived机器的一个标识
router_id LVS_DEVEL
}
#监控多个网段的实例
vrrp_instance VI_1 {
state MASTER
# 非抢占模式
nopreempt
# 内网网卡名。
interface enp3s0
virtual_router_id 80
# 权重值,根据权重绑定虚拟ip
priority 140
advert_int 1
authentication {
auth_type PASS
# 主从认证密码必须一致
auth_pass just0kk
}
# 虚拟IP(VIP)
virtual_ipaddress {
192.168.1.200
}
}
#对外虚拟IP地址(该ip不可被实际服务器使用)
virtual_server 192.168.1.200 6443 {
#检查真实服务器时间,单位秒
delay_loop 6
#设置负载调度算法,rr为轮训
lb_algo rr
#设置LVS负载均衡NAT模式
lb_kind DR
#使用TCP协议检查realserver状态
#第一个节点
real_server 192.168.1.14 6443 {
#节点权重值
weight 3
#健康检查方式
TCP_CHECK {
connect_timeout 3 #连接超时
nb_get_retry 3 #重试次数
delay_before_retry 3 #重试间隔/S
}
}
#第二个节点
real_server 192.168.1.125 6443 {
weight 3
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
#第三个节点
real_server 192.168.1.12 6443 {
weight 3
TCP_CHECK {
connect_timeout 3
nb_get_retry 3
delay_before_retry 3
}
}
}
EOF
注意:
- 内网网卡名:interface 的值需在每台服务器上执行ifconfig查询,根据返回结果修改
- 权重值:priority 的值类似于Nginx的负载均衡中的权重
- VIP对应的ip必须为虚拟ip,不可为实际服务器使用的ip
3.启动服务
# 设置开机启动
systemctl enable keepalived
# 启动keepalived,三台master需同时启动
systemctl start keepalived
# 查看keepalived状态
systemctl status keepalived
# 查看192.168.1.200是否在master1上
ip a
四. etcd集群搭建
1.安装cfssl(选择一台master节点即可)
sudo wget https://pkg.cfssl.org/R1.2/cfssl_linux-amd64
sudo wget https://pkg.cfssl.org/R1.2/cfssljson_linux-amd64
sudo chmod +x cfssl_linux-amd64 cfssljson_linux-amd64
sudo mv cfssl_linux-amd64 /usr/local/bin/cfssl
sudo mv cfssljson_linux-amd64 /usr/local/bin/cfssljson
2. 安装etcd二进制文件(所有etcd节点)
# 创建目录
sudo mkdir -p /data/etcd/bin
# 下载etcd文件
cd /tmp
sudo wget https://storage.googleapis.com/etcd/v3.3.25/etcd-v3.3.25-linux-amd64.tar.gz
tar zxf etcd-v3.3.25-linux-amd64.tar.gz
cd etcd-v3.3.25-linux-amd64
# 将etcd文件移动到bin目录下
sudo mv etcd etcdctl /data/etcd/bin/
# 创建证书目录
sudo mkdir -p /data/etcd/ssl
# 创建etcd.conf文件保存目录
sudo mkdir -p /data/etcd/cfg
# 创建etcd数据保存目录
sudo mkdir -p /data/etcd/data
3.创建etcd集群证书(在安装cfssl的master节点上执行)
1) 进入证书目录
cd /data/etcd/ssl
2) 创建生成ca证书json
cat <<EOF | sudo tee /data/etcd/ssl/ca-config.json
{
"signing": {
"default": {
"expiry": "438000h"
},
"profiles": {
"server": {
"expiry": "438000h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
},
"client": {
"expiry": "438000h",
"usages": [
"signing",
"key encipherment",
"client auth"
]
},
"peer": {
"expiry": "438000h",
"usages": [
"signing",
"key encipherment",
"server auth",
"client auth"
]
}
}
}
}
EOF
3) 创建生成证书签名请求json
cat <<EOF | sudo tee /data/etcd/ssl/ca-csr.json
{
"CN": "etcd",
"key": {
"algo": "rsa",
"size": 2048
}
}
EOF
4) 创建生成客户端证书json
cat <<EOF | sudo tee /data/etcd/ssl/client.json
{
"CN": "client",
"key": {
"algo": "ecdsa",
"size": 256
}
}
EOF
5) 创建生成server,peer证书json
cat <<EOF | sudo tee /data/etcd/ssl/etcd.json
{
"CN": "etcd",
"hosts": [
"192.168.1.12",
"192.168.1.14",
"192.168.1.16",
"192.168.1.18",
"192.168.1.19",
"192.168.1.27",
"192.168.1.125",
"192.168.1.200"
],
"key": {
"algo": "ecdsa",
"size": 256
},
"names": [
{
"C": "CN",
"L": "BJ",
"ST": "BJ"
}
]
}
EOF
6) 生成证书
#生成CA证书和私钥
sudo cfssl gencert -initca ca-csr.json | cfssljson -bare ca
#生成客户端证书
sudo cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=client client.json | cfssljson -bare client
#生成server证书
sudo cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server etcd.json | cfssljson -bare server
#生成peer证书
sudo cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer etcd.json | cfssljson -bare peer
7)同步证书
将/data/etcd/ssl 目录下所有生成的证书复制到其他etcd节点的/data/etcd/ssl目录下。建议使用ansible的copy命令。
4.创建etcd配置文件(所有etcd节点)
1)创建etcd.conf
cat <<EOF | sudo tee /data/etcd/cfg/etcd.conf
ETCD_DATA_DIR="/data/etcd/data"
#本机ip
ETCD_LISTEN_PEER_URLS="https://192.168.1.18:2380"
#本机ip
ETCD_LISTEN_CLIENT_URLS="https://192.168.1.18:2379,http://127.0.0.1:2379"
#本机etc名称,与ETCD_INITIAL_CLUSTER中对应
ETCD_NAME=etc6
ETCD_QUOTA_BACKEND_BYTES="8388608000"
ETCD_SNAPSHOT_COUNT="500000"
#本机ip
ETCD_INITIAL_ADVERTISE_PEER_URLS="https://192.168.1.18:2380"
#本机ip
ETCD_ADVERTISE_CLIENT_URLS="https://192.168.1.18:2379"
ETCD_INITIAL_CLUSTER="etc1=https://192.168.1.12:2380,etc2=https://192.168.1.14:2380,etc3=https://192.168.1.19:2380,etc4=https://192.168.1.125:2380,etc5=https://192.168.1.16:2380,etc6=https://192.168.1.18:2380,etc7=https://192.168.1.27:2380"
ETCD_INITIAL_CLUSTER_TOKEN="etcd-cluster"
ETCD_INITIAL_CLUSTER_STATE="new"
ETCD_CERT_FILE="/data/etcd/ssl/server.pem"
ETCD_KEY_FILE="/data/etcd/ssl/server-key.pem"
ETCD_CLIENT_CERT_AUTH="True"
ETCD_TRUSTED_CA_FILE="/data/etcd/ssl/ca.pem"
ETCD_AUTO_TLS="True"
ETCD_PEER_CERT_FILE="/data/etcd/ssl/peer.pem"
ETCD_PEER_KEY_FILE="/data/etcd/ssl/peer-key.pem"
ETCD_PEER_CLIENT_CERT_AUTH="True"
ETCD_PEER_TRUSTED_CA_FILE="/data/etcd/ssl/ca.pem"
ETCD_PEER_AUTO_TLS="True"
ETCD_LOG_OUTPUT="default"
ETCD_AUTO_COMPACTION_RETENTION="1"
ETCD_HEARTBEAT_INTERVAL=6000
ETCD_ELECTION_TIMEOUT=30000
EOF
注意:
- ETCD_LISTEN_PEER_URLS、ETCD_LISTEN_CLIENT_URL、ETCD_INITIAL_ADVERTISE_PEER、ETCD_ADVERTISE_CLIENT_URLS的值需修改为每台etcd节点的ip
- ETCD_NAME的值需与ETCD_INITIAL_CLUSTER中本机ip对应的值相同
2). 创建systemd配置文件
cat <<EOF | sudo tee /lib/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
EnvironmentFile=/data/etcd/cfg/etcd.conf
ExecStart=/data/etcd/bin/etcd
Restart=always
RestartSec=15
LimitNOFILE=65536
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
EOF
5.启动etcd集群及验证
1) .启动etcd集群
systemctl daemon-reload
systemctl enable etcd
systemctl start etcd
systemctl status etcd
2) 验证是否成功
# 进入证书目录
cd /data/etcd/ssl
# 查看状态
sudo ../bin/etcdctl --ca-file=ca.pem --cert-file=server.pem --key-file=server-key.pem --endpoints="https://192.168.1.12:2379" cluster-health
# 查看集群主机
sudo ../bin/etcdctl --ca-file=ca.pem --cert-file=server.pem --key-file=server-key.pem --endpoints="https://192.168.1.12:2379" member list
# 查看集群状态信息
ETCDCTL_API=3 /data/etcd/bin/etcdctl --cacert=/data/etcd/ssl/ca.pem --cert=/data/etcd/ssl/server.pem --key=/data/etcd/ssl/server-key.pem --endpoints="https://192.168.1.12:2379,https://192.168.1.14:2379,https://192.168.1.16:2379,https://192.168.1.18:2379,https://192.168.1.19:2379,https://192.168.1.27:2379,https://192.168.1.125:2379" --write-out=table endpoint status
五、安装kubelet kubeadm kubectl
1. 安装kubelet kubeadm kubectl(所有节点)
# 本次安装版本为 1.23.6,如果想选其他版本可直接修改版本号
sudo apt install kubeadm=1.23.6-00
sudo apt install kubectl=1.23.6-00
sudo apt install kubelet=1.23.6-00
sudo apt-mark hold kubelet kubeadm kubectl
2. kubeadm init(master节点)
1)生成证书
# 选择一个master节点上将搭建etcd时生成的的ca证书和客户端证书复制到指定地点并重命名
sudo mkdir -p /etc/kubernetes/pki/etcd/
#etcd集群的ca证书
sudo cp /data/etcd/ssl/ca.pem /etc/kubernetes/pki/etcd/
#etcd集群的client证书,apiserver访问etcd使用
sudo cp /data/etcd/ssl/client.pem /etc/kubernetes/pki/apiserver-etcd-client.pem
#etcd集群的client私钥
sudo cp /data/etcd/ssl/client-key.pem /etc/kubernetes/pki/apiserver-etcd-client-key.pem
#查看证书
sudo tree /etc/kubernetes/pki/
#结果显示如下即可
#/etc/kubernetes/pki/
#├── apiserver-etcd-client-key.pem
#├── apiserver-etcd-client.pem
#└── etcd
# └── ca.pem
#1 directory, 3 files
2)创建kubeadm 初始化配置文件
# 可以执行以下命令生成默认配置文件,然后修改 #备注的 配置
#sudo kubeadm config print init-defaults > kubeadm-init.yaml
# 这里直接在脚本中写入kubeadm-init.yaml
cat <<EOF | sudo tee /etc/kubernetes/kubeadm-init.yaml
apiVersion: kubeadm.k8s.io/v1beta3
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
localAPIEndpoint:
# 本机IP
advertiseAddress: 192.168.1.14
bindPort: 6443
nodeRegistration:
criSocket: /var/run/dockershim.sock
imagePullPolicy: IfNotPresent
name: ceph01
# taints: null
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
---
apiServer:
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta3
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns: {}
etcd:
#local:
# dataDir: /var/lib/etcd
#master、node节点列表
external:
endpoints:
- https://192.168.1.12:2379
- https://192.168.1.14:2379
- https://192.168.1.19:2379
- https://192.168.1.16:2379
- https://192.168.1.18:2379
- https://192.168.1.27:2379
- https://192.168.1.125:2379
#搭建etcd集群时生成的ca证书
caFile: /etc/kubernetes/pki/etcd/ca.pem
#搭建etcd集群时生成的客户端证书
certFile: /etc/kubernetes/pki/apiserver-etcd-client.pem
#搭建etcd集群时生成的客户端密钥
keyFile: /etc/kubernetes/pki/apiserver-etcd-client-key.pem
#修改镜像下载地址
imageRepository: registry.aliyuncs.com/google_containers
kind: ClusterConfiguration
kubernetesVersion: 1.23.6
#虚拟IP
controlPlaneEndpoint: 192.168.1.200
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
#这里如果默认的配置文件中没有则加上
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
EOF
3)执行kubeadm 初始化
sudo kubeadm init --config=kubeadm-init.yaml
保存执行结果中的join,node节点加入时使用。
注:将join中换行符 \ 替换为空格改为一行,否则执行会报错
4)配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
测试是否正常
kubectl get nodes
3.其他master节点init
1)将 master1 中的 生成的集群共用的ca 证书 scp 到其他 master 机器
# 1. 需先确定master2上是否存在etc/kubernetes/pki/目录,没有则需新建
# 如果master2非root用户,则无法直接scp到etc/kubernetes/pki/。可以先在当前用户下新建一个目录scp到该目录下,再mv到etc/kubernetes/pki/
sudo scp -r /etc/kubernetes/pki/* zdy@192.168.1.14:/home/zdy/fxy
# 2.将初始化配置文件复制到master2。同上
sudo scp -r /etc/kubernetes/kubeadm-init.yaml root@192.168.1.178:/etc/kubernetes/
# 3. 修改kubeadm-init.yaml中# 标识的配置后,初始化
sudo kubeadm init --config=kubeadm-init.yaml
2) 其他master节点 kubeadm init
#修改kubeadm-init.yaml中# 标识的配置后,初始化
sudo kubeadm init --config=kubeadm-init.yaml
# 配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 查看节点
kubectl get nodes
4. node节点加入k8s集群
# 执行master节点生成的join语句,实际根据master节点kubeadm init返回的
kubeadm join 192.168.1.200:6443 --token abcdef.0123456789abcdef --discovery-token-ca-cert-hash sha256:16b2038638b57ae9569e2f45eeecda503e278ee91e0e54973f24dbba6b9c785e
# 查看节点
kubectl get nodes
5. 安装calico网络插件
安装calico 插件
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
六、常见问题
1.master上输入 kubectl get nodes命令。有节点显示NotReady
1)问题排查
在节点上查看日志:
journalctl -f -u kubelet.service
有如下报错:
May 23 03:06:48 node2 kubelet[1413]: I0523 03:06:48.973756 1413 cni.go:240] "Unable to update cni config" err="no networks found in /etc/cni/net.d
May 23 03:06:50 node2 kubelet[1413]: E0523 03:06:50.252006 1413 kubelet.go:2386] "Container runtime network not ready" networkReady="NetworkReady=
2)解决办法
将master上/etc/cni/net.d 目录下的文件拷贝到有问题的节点上的/etc/cni/net.d目录下(没有该目录则新建)
2.kubectl get node时报 x509 错:
1)问题排查
在节点上查看日志:
journalctl -f -xu
有如下报错:
Unable to connect to the server: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")
2)解决办法
# 删除$HOME/.kube后,
rm -rf $HOME/.kube
# 再执行一遍
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
3.kubectl get pod -n kube-system查看pod,发现calico有pod出现:
calico-node-c5jln 0/1 Running 0 2d15h
1)问题排查
# 查看pod的日志
kubectl logs -f calico-node-c5jln -n kube-system
# 日志中显示:
2022-05-30 00:17:51.174 [INFO][97] monitor-addresses/autodetection_methods.go 103: Using autodetected IPv4 address on interface br-d073cb5ec7e0: 172.19.0.1/16
# 在各节点中查看docker的网桥;
docker network ls
#d073cb5ec7e0 mongodb_default bridge local
2)解决办法
# 删除该网桥:
docker network remove d073cb5ec7e0
# 重启pod:
sudo reboot
4.systemctl status keepalived时,发现报错:IPVS (cmd 1159, errno 2): No such file or directory
1)解决办法
# 下载缺少的依赖包
apt install libnl* popt* -y
# 重启keepalived
systemctl restart keepalived
5.部署失败时,删除etcd数据,执行以下脚本(以 grep 关键字)
#!/bin/bash
imge=$(ETCDCTL_API=3 /data/etcd/bin/etcdctl --cacert=/data/etcd/ssl/ca.pem --cert=/data/etcd/ssl/server.pem --key=/data/etcd/ssl/server-key.pem --endpoints="https://192.168.1.12:2379,https://192.168.1.14:2379,https://192.168.1.16:2379,https://192.168.1.18:2379,https://192.168.1.19:2379,https://192.168.1.27:2379,https://192.168.1.125:2379" get --prefix --keys-only '' | grep cattle | awk {'print $1'})
echo "------------------ delete cattle --------------------"
for i in $imge;do
ETCDCTL_API=3 /data/etcd/bin/etcdctl --cacert=/data/etcd/ssl/ca.pem --cert=/data/etcd/ssl/server.pem --key=/data/etcd/ssl/server-key.pem --endpoints="https://192.168.1.12:2379,https://192.168.1.14:2379,https://192.168.1.16:2379,https://192.168.1.18:2379,https://192.168.1.19:2379,https://192.168.1.27:2379,https://192.168.1.125:2379" del --prefix --prev-kv $i
echo $i
done
更多推荐
所有评论(0)