基于kubeadm部署k8s集群,使用二进制文件部署的etcd节点扩容
可以看到etcd详细配置,这里添加新的etcd节点主要需要关注etcd的证书,配置中主要包括三个证书----根证书、客户端通讯用的server证书、etcd服务端之间通讯用的peer证书。所以,总结来说,要备份 etcd 集群的数据,需要:暂时停止 apiserver 写入 etcd。恢复 apiserver 的 etcd 数据存储管理功能备份的数据文件 snapshot.db 可以用于在数据损坏
添加新节点的思路:
- 使用原有etcd集群的根证书给新节点签发一个新的证书
- 将其他节点的etcd二进制文件、服务器启动用的service文件copy到新的服务器
- 修改etcd的service文件的启动配置,将配置中的证书、ip替换
- 保险起见备份etcd集群的数据
- 在原有集群执行member add指令添加新节点
- 启动新节点,等待加入etcd集群同步节点
查看etcd启动配置
# systemctl status etcd
etcd.service - Etcd Server
Loaded: loaded (/etc/systemd/system/etcd.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2023-04-25 00:59:57 CST; 8h ago
Docs: https://github.com/coreos
Main PID: 128800 (etcd)
Tasks: 28
Memory: 3.3G
CGroup: /system.slice/etcd.service
└─128800 /usr/local/bin/etcd --name=etcd-k8s-master-1 --cert-file=/etc/kubernetes/pki/etcd/server.crt --key-file=/etc/kubernetes/pki/etcd/server.key --peer-c...
可以在 /etc/systemd/system/etcd.service 目录下找到etcd的启动指令、配置
[root@k8s-master-1 etcd]# cat /etc/systemd/system/etcd.service
[Unit]
Description=Etcd Server
After=network.target
After=network-online.target
Wants=network-online.target
Documentation=https://github.com/coreos
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd
ExecStart=/usr/local/bin/etcd \
--name=etcd-k8s-master-1 \
--cert-file=/etc/kubernetes/pki/etcd/server.crt \
--key-file=/etc/kubernetes/pki/etcd/server.key \
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt \
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key \
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--initial-advertise-peer-urls=https://172.18.100.27:2380 \
--listen-peer-urls=https://172.18.100.27:2380 \
--listen-client-urls=https://172.18.100.27:2379,http://127.0.0.1:2379 \
--advertise-client-urls=https://172.18.100.27:2379 \
--initial-cluster-token=etcd-cluster-token \
--initial-cluster=etcd-k8s-master-1=https://172.18.100.27:2380,etcd-k8s-master-2=https://172.18.100.25:2380,etcd-k8s-master-3=https://172.18.100.28:2380 \
--initial-cluster-state=new \
--data-dir=/var/lib/etcd \
--snapshot-count=50000 \
--auto-compaction-retention=1 \
--max-request-bytes=10485760 \
--quota-backend-bytes=8589934592
Restart=always
RestartSec=15
LimitNOFILE=65536
OOMScoreAdjust=-999
[Install]
WantedBy=multi-user.target
可以看到etcd详细配置,这里添加新的etcd节点主要需要关注etcd的证书,配置中主要包括三个证书----根证书、客户端通讯用的server证书、etcd服务端之间通讯用的peer证书
签发新节点的证书
这里需要使用到cfssl
指令通过根证书签发新节点server、peer证书,签发证书有以下几步:
-
需要定义签发证书的配置文件,主要定义每个证书的有效期、权限,这个定义文件
ca-config.json
文件如下:{ "signing": { "default": { "expiry": "87600h" }, "profiles": { "server": { "expiry": "87600h", "usages": [ "signing", "key encipherment", "server auth", "client auth" ] }, "client": { "expiry": "87600h", "usages": [ "signing", "key encipherment", "client auth" ] }, "peer": { "expiry": "87600h", "usages": [ "signing", "key encipherment", "server auth", "client auth" ] } } } }
-
定义server证书
server-csr.json
,这个证书主要用于服务端之间通讯的{ "CN": "etcd", "hosts": [ "172.18.100.27", "172.18.100.25", "172.18.100.28", "172.18.110.201", "172.18.110.202" ], "key": { "algo": "rsa", "size": 2048 }, "names": [ { "C": "CN", "L": "Beijing", "ST": "Beijing" } ] }
-
定义
client-csr.json
{ "CN": "client", "key": { "algo": "rsa", "size": 2048 } }
-
执行以下命令生成生成证书,放到
/opt/etcd/ssl
目录下(ca.pem,ca-key.pen是原etcd集群的证书拷贝到新节点执行即可)cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=server server-csr.json | cfssljson -bare server cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=ca-config.json -profile=peer server-csr.json | cfssljson -bare peer cfssl gencert -ca=ca.pem -ca-key=ca-key.pem -config=conf/ca-config.json -profile=client conf/client-csr.json | cfssljson -bare client
修改新节点service配置启动文件
参考原配置,主要修改ip和证书路径即可
[Unit]
Description=Etcd Server
After=network.target
[Service]
Type=notify
WorkingDirectory=/var/lib/etcd
ExecStart=/usr/local/bin/etcd \
--name=etcd-k8s-master-4 \
--cert-file=/opt/etcd/ssl/server.pem \
--key-file=/opt/etcd/ssl/server-key.pem \
--peer-cert-file=/opt/etcd/ssl/peer.pem \
--peer-key-file=/opt/etcd/ssl/peer-key.pem \
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt \
--initial-advertise-peer-urls=https://172.18.110.201:2380 \
--listen-peer-urls=https://172.18.110.201:2380 \
--listen-client-urls=https://172.18.110.201:2379,https://127.0.0.1:2379 \
--advertise-client-urls=https://172.18.110.201:2379 \
--initial-cluster-token=etcd-cluster-token \
--initial-cluster=etcd-k8s-master-1=https://172.18.100.27:2380,etcd-k8s-master-2=https://172.18.100.25:2380,etcd-k8s-master-3=https://172.18.100.28:2380,etcd-k8s-master-4=https://172.18.110.201:2380 \
--initial-cluster-state=existing \
--data-dir=/var/lib/etcd \
--snapshot-count=50000 \
--auto-compaction-retention=1 \
--max-request-bytes=10485760
Restart=always
RestartSec=10s
LimitNOFILE=40000
[Install]
WantedBy=multi-user.target
备份etcd集群
备份 etcd 集群的数据非常重要,可以使用 etcdctl snapshot save 命令进行备份。
基本格式为:
bash
etcdctl snapshot save <db-file-name>.db
这会保存 etcd 当前的快照数据到指定的 db 文件中。我们可以通过以下方式进行 etcd 集群的数据备份:1. 停止写请求到 etcd 集群
这可以确保快照数据是一致的,不会由于写入产生脏数据。可以暂时在 apiserver 中停用 etcd 的存储管理功能。2. 在所有 etcd 节点上同时备份数据
这可以从各节点获得本地的数据快照,并基于多份快照数据恢复出整个集群的状态。3. 恢复 apiserver 对 etcd 的使用
将 apiserver 中 etcd 的数据存储管理功能重新启用。那么,在 3 节点 etcd 集群中备份数据的具体命令可以为:etcd 节点 1:
bash
etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save snapshot.db
etcd 节点 2:
bash
etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save snapshot.db
etcd 节点 3:
bash
etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key snapshot save snapshot.db
这会在每个 etcd 节点上保存名为 snapshot.db 的数据库快照文件。
然后重新启动 apiserver 中对 etcd 的访问,恢复正常服务。
这 3 个快照文件就可以在 etcd 数据损坏时用于恢复整个集群的状态。所以,总结来说,要备份 etcd 集群的数据,需要:暂时停止 apiserver 写入 etcd
在每个 etcd 节点使用 etcdctl snapshot save 命令进行备份
恢复 apiserver 的 etcd 数据存储管理功能备份的数据文件 snapshot.db 可以用于在数据损坏时快速恢复 etcd 集群的状态。
定期进行备份可以确保 etcd 中的关键数据安全,这对集群的稳定运维非常重要。
添加etcd新节点
ETCDCTL_API=3 etcdctl --endpoints=https://172.18.100.27:2379,https://172.18.100.25:2379,https://172.18.100.28:2379,https://172.18.110.201:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key member add etcd-k8s-master-4 --peer-urls=https://172.18.110.201:2380
启动新节点服务
在新节点启动etcd服务
systemctl start etcd
APIServer配置文件中增加新添加的etcd节点
在每个master节点/etc/kubernetes/manifests
下,找到kube-apiserver.yaml,添加新增的etcd节点
apiVersion: v1
kind: Pod
metadata:
annotations:
kubeadm.kubernetes.io/kube-apiserver.advertise-address.endpoint: 172.18.100.27:6443
creationTimestamp: null
labels:
component: kube-apiserver
tier: control-plane
name: kube-apiserver
namespace: kube-system
spec:
containers:
- command:
- kube-apiserver
- --advertise-address=172.18.100.27
- --allow-privileged=true
- --apiserver-count=3
- --authorization-mode=Node,RBAC
- --client-ca-file=/etc/kubernetes/pki/ca.crt
- --enable-admission-plugins=NodeRestriction,PodNodeSelector
- --enable-aggregator-routing=true
- --enable-bootstrap-token-auth=true
- --encryption-provider-config=/etc/kubernetes/pki/secrets-encryption.yaml
- --etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
- --etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
- --etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
- --etcd-servers=https://172.18.100.27:2379,https://172.18.100.25:2379,https://172.18.100.28:2379,https://172.18.110.201:2379
- --feature-gates=RemoveSelfLink=false
- --insecure-port=0
- --kubelet-certificate-authority=/etc/kubernetes/pki/ca.crt
- --kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
- --kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
- --kubelet-https=true
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --profiling=false
- --proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
- --proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
- --requestheader-allowed-names=front-proxy-client
- --requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
- --requestheader-extra-headers-prefix=X-Remote-Extra-
- --requestheader-group-headers=X-Remote-Group
- --requestheader-username-headers=X-Remote-User
- --secure-port=6443
- --service-account-issuer=https://kubernetes.default.svc.cluster.local
- --service-account-key-file=/etc/kubernetes/pki/sa.pub
- --service-account-signing-key-file=/etc/kubernetes/pki/sa.key
- --service-cluster-ip-range=192.168.0.0/16
- --service-node-port-range=30000-32767
- --tls-cert-file=/etc/kubernetes/pki/apiserver.crt
- --tls-private-key-file=/etc/kubernetes/pki/apiserver.key
image: k8s.gcr.io/kube-apiserver:v1.20.10
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 8
httpGet:
host: 172.18.100.27
path: /livez
port: 6443
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
name: kube-apiserver
readinessProbe:
failureThreshold: 3
httpGet:
host: 172.18.100.27
path: /readyz
port: 6443
scheme: HTTPS
periodSeconds: 1
timeoutSeconds: 15
resources:
requests:
cpu: 250m
startupProbe:
failureThreshold: 24
httpGet:
host: 172.18.100.27
path: /livez
port: 6443
scheme: HTTPS
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 15
volumeMounts:
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/pki
name: etc-pki
readOnly: true
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /etc/localtime
name: localtime
readOnly: true
hostNetwork: true
priorityClassName: system-node-critical
volumes:
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/pki
type: DirectoryOrCreate
name: etc-pki
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /etc/localtime
type: File
name: localtime
status: {}
Kubesphere监控面板增加etcd节点监控
$ kubectl edit cc ks-installer -n kubesphere-system
...
etcd:
endpointIps: 172.18.100.27,172.18.100.28,172.18.100.25,172.18.110.201
monitoring: true
port: 2379
tlsEnable: true
events:
enabled: true
gatekeeper:
enabled: true
...
大功告成!
更多推荐
所有评论(0)