官方文档:为 Kubernetes 运行 etcd | 备份 etcd 集群

1. 先决条件

  • etcd 是兼具一致性和高可用性的键值数据库,可以作为保存 Kubernetes 所有集群数据的后台数据库。

  • 运行的 etcd 集群个数成员为奇数。

  • etcd 是一个 leader-based 分布式系统。确保主节点定期向所有从节点发送心跳,以保持集群稳定。

  • 确保不发生资源不足。

    集群的性能和稳定性对网络和磁盘 I/O 非常敏感。任何资源匮乏都会导致心跳超时, 从而导致集群的不稳定。不稳定的情况表明没有选出任何主节点。 在这种情况下,集群不能对其当前状态进行任何更改,这意味着不能调度新的 pod。

  • 所有 Kubernetes 对象都存储在 etcd 上。定期备份 etcd 集群数据对于在灾难场景(例如丢失所有控制平面节点)下恢复 Kubernetes 集群非常重要。 快照文件包含所有 Kubernetes 状态和关键信息。为了保证敏感的 Kubernetes 数据的安全,可以对快照文件进行加密。

    备份 etcd 集群可以通过两种方式完成:etcd 内置快照卷快照

2. 内置快照

  • etcd 支持内置快照。快照可以从使用 etcdctl snapshot save 命令的活动成员中获取, 也可以通过从 etcd 数据目录 复制 member/snap/db 文件,该 etcd 数据目录目前没有被 etcd 进程使用。获取快照不会影响成员的性能。

2.1 安装 etcd

  1. 集群组件中没有 etcd
[root@k8s1 ~]# kubectl get pod -n kube-system 
NAME                           READY   STATUS    RESTARTS   AGE
coredns-bdc44d9f-frm2b         1/1     Running   0          3m8s
coredns-bdc44d9f-xhlbg         1/1     Running   0          3m8s
etcd-k8s1                      1/1     Running   1          3m28s
kube-apiserver-k8s1            1/1     Running   1          3m28s
kube-controller-manager-k8s1   1/1     Running   2          3m26s
kube-flannel-ds-59ndm          1/1     Running   0          2m42s
kube-flannel-ds-jmtfd          1/1     Running   0          98s
kube-flannel-ds-njlz7          1/1     Running   0          83s
kube-proxy-9qb29               1/1     Running   0          3m8s
kube-proxy-cl5w6               1/1     Running   0          83s
kube-proxy-t5mlj               1/1     Running   0          98s
kube-scheduler-k8s1            1/1     Running   2          3m28s
  1. 安装 etcd
[root@k8s1 ~]# yum install -y etcd

2.2 获取 ENDPOINT 所提供的键空间的快照到文件 snapshotdb

  1. 通过指定端点,证书等来拍摄快照
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt member list
22f19206c985d9c7, started, k8s1, https://172.25.21.1:2380, https://172.25.21.1:2379
  1. 用表格的形式查看快照文件
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt member list -w table
+------------------+---------+------+--------------------------+--------------------------+
|        ID        | STATUS  | NAME |        PEER ADDRS        |       CLIENT ADDRS       |
+------------------+---------+------+--------------------------+--------------------------+
| 22f19206c985d9c7 | started | k8s1 | https://172.25.21.1:2380 | https://172.25.21.1:2379 |
+------------------+---------+------+--------------------------+--------------------------+
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt endpoint status -w table
+----------------+------------------+---------+---------+-----------+-----------+------------+
|    ENDPOINT    |        ID        | VERSION | DB SIZE | IS LEADER | RAFT TERM | RAFT INDEX |
+----------------+------------------+---------+---------+-----------+-----------+------------+
| 127.0.0.1:2379 | 22f19206c985d9c7 |   3.5.0 |  2.6 MB |      true |         2 |       1553 |
+----------------+------------------+---------+---------+-----------+-----------+------------+
  1. 查看信息
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt get / --prefix

在这里插入图片描述

  1. 查看到pod demo的key
[root@k8s1 ~]# kubectl run demo --image=nginx
pod/demo created
[root@k8s1 ~]# kubectl get pod
NAME   READY   STATUS    RESTARTS   AGE
demo   1/1     Running   0          59s
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt get /registry/pods --prefix --keys-only

在这里插入图片描述

  • kubectl 在创建 pod 时,会将信息写入 etcd,调度器会去 etcd 读取,然后调度到worker节点,
    kubelet开始拉起pod(通过runtime)
  1. 备份数据
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt snapshot save /tmp/backup/`hostname`-etcd-`date +%Y%m%d%H%M`.db
Snapshot saved at /tmp/backup/k8s1-etcd-202204232220.db

注意!!!

  • 在做备份和恢复的时候,出现了问题。

    问题是当将备份的数据 etcd 恢复后,发现 etcd 服务是起不来的,
    分析后,认为原因可能是出在了 kubeadm 集群的版本和 etcd 的 3.3 版本不合适。
    因此,我将之前版本为 1.22.2 的 kubeadm 集群又进行了一次升级,版本为 1.23.1

  • etcd 的版本是3.3,kubeadm 集群的版本应该在 1.23.1以上

[root@k8s1 ~]# yum list etcd
Loaded plugins: product-id, search-disabled-repos, subscription-manager
This system is not registered with an entitlement server. You can use subscription-manager to register.
Installed Packages
etcd.x86_64                             3.3.11-2.el7.centos                             @extras
[root@k8s1 ~]# kubectl get node
NAME   STATUS   ROLES                  AGE   VERSION
k8s1   Ready    control-plane,master   2d    v1.22.2
k8s2   Ready    <none>                 2d    v1.22.2
k8s3   Ready    <none>                 2d    v1.22.2

在 harbor 仓库中准备好升级需要的镜像

  1. 首先需要进入harbor的目录才能使用compose命令开启服务
  • 存在第一个 error,解决方法:进入 harbor 配置目录进行服务的启动
[root@k8s4 ~]# docker-compose start
ERROR: 
        Can't find a suitable configuration file in this directory or any
        parent. Are you in the right directory?

        Supported filenames: docker-compose.yml, docker-compose.yaml
[root@k8s4 ~]# cd harbor/
[root@k8s4 harbor]# ls
common  common.sh  docker-compose.yml  harbor.v1.10.1.tar.gz  harbor.yml  install.sh  LICENSE  prepare
[root@k8s4 harbor]# docker-compose start
Starting log         ... done
Starting registry    ... done
Starting registryctl ... done
Starting postgresql  ... done
Starting portal      ... done
Starting redis       ... done
Starting core        ... done
Starting jobservice  ... done
Starting proxy       ... done
  • 存在第二个 error,解决方法:进行仓库的登陆 docker login
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kubeadm:v1.23.1
Error response from daemon: pull access denied for registry.aliyuncs.com/google_containers/kubeadm, repository does not exist or may require 'docker login': denied: requested access to the resource is denied
[root@k8s4 ~]# docker login reg.westos.org
Authenticating with existing credentials...
WARNING! Your password will be stored unencrypted in /root/.docker/config.json.
Configure a credential helper to remove this warning. See
https://docs.docker.com/engine/reference/commandline/login/#credentials-store

Login Succeeded
  1. harbor 仓库下载需要版本的组件
  • 先拉取
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kube-apiserver:v1.23.1
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kube-controller-manager:v1.23.1
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kube-scheduler:v1.23.1
[root@k8s4 ~]# docker pull registry.aliyuncs.com/google_containers/kube-proxy:v1.23.1
  • 再改名字
[root@k8s4 ~]# docker tag registry.aliyuncs.com/google_containers/kube-apiserver:v1.23.1 reg.westos.org/k8s/kube-apiserver:v1.23.1
[root@k8s4 ~]# docker tag registry.aliyuncs.com/google_containers/kube-controller-manager:v1.23.1 reg.westos.org/k8s/kube-controller-manager:v1.23.1
[root@k8s4 ~]# docker tag registry.aliyuncs.com/google_containers/kube-scheduler:v1.23.1 reg.westos.org/k8s/kube-scheduler:v1.23.1
[root@k8s4 ~]# docker tag registry.aliyuncs.com/google_containers/kube-proxy:v1.23.1 reg.westos.org/k8s/kube-proxy:v1.23.1
  • 最后上传
[root@k8s4 ~]# docker push reg.westos.org/k8s/kube-apiserver:v1.23.1
[root@k8s4 ~]# docker push reg.westos.org/k8s/kube-controller-manager:v1.23.1
[root@k8s4 ~]# docker push reg.westos.org/k8s/kube-scheduler:v1.23.1
[root@k8s4 ~]# docker push reg.westos.org/k8s/kube-proxy:v1.23.1

控制平面节点:升级 kubeadm

[root@k8s1 ~]# yum install -y kubeadm-1.23.1-0
[root@k8s1 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:39:51Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}

控制平面节点:验证升级计划

[root@k8s1 ~]# kubeadm upgrade plan
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.22.1
[upgrade/versions] kubeadm version: v1.23.1
[upgrade/versions] Target version: v1.23.6
[upgrade/versions] Latest version in the v1.22 series: v1.22.9

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     3 x v1.22.2   v1.22.9

Upgrade to the latest version in the v1.22 series:

COMPONENT                 CURRENT   TARGET
kube-apiserver            v1.22.1   v1.22.9
kube-controller-manager   v1.22.1   v1.22.9
kube-scheduler            v1.22.1   v1.22.9
kube-proxy                v1.22.1   v1.22.9
CoreDNS                   v1.8.4    v1.8.6
etcd                      3.5.0-0   3.5.1-0

You can now apply the upgrade by executing the following command:

	kubeadm upgrade apply v1.22.9

_____________________________________________________________________

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     3 x v1.22.2   v1.23.6

Upgrade to the latest stable version:

COMPONENT                 CURRENT   TARGET
kube-apiserver            v1.22.1   v1.23.6
kube-controller-manager   v1.22.1   v1.23.6
kube-scheduler            v1.22.1   v1.23.6
kube-proxy                v1.22.1   v1.23.6
CoreDNS                   v1.8.4    v1.8.6
etcd                      3.5.0-0   3.5.1-0

You can now apply the upgrade by executing the following command:

	kubeadm upgrade apply v1.23.6

Note: Before you can perform this upgrade, you have to update kubeadm to v1.23.6.

_____________________________________________________________________


The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.

API GROUP                 CURRENT VERSION   PREFERRED VERSION   MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io   v1alpha1          v1alpha1            no
kubelet.config.k8s.io     v1beta1           v1beta1             no
_____________________________________________________________________

控制平面节点:执行 kubeadm upgrade

[root@k8s1 ~]# kubeadm upgrade apply v1.23.1
......
[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.23.1". Enjoy!

[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

控制平面节点:腾空节点

[root@k8s1 ~]# kubectl drain k8s1 --ignore-daemonsets
node/k8s1 already cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-59ndm, kube-system/kube-proxy-t4zpd
node/k8s1 drained
[root@k8s1 ~]# kubectl get node
NAME   STATUS                     ROLES                  AGE   VERSION
k8s1   Ready,SchedulingDisabled   control-plane,master   2d    v1.22.2
k8s2   Ready                      <none>                 2d    v1.22.2
k8s3   Ready                      <none>                 2d    v1.22.2

控制平面节点:升级 kubectl 和 kubelet

[root@k8s1 ~]# yum install -y kubelet-1.23.1-0 kubectl-1.23.1-0
[root@k8s1 ~]# systemctl daemon-reload 
[root@k8s1 ~]# systemctl restart kubelet.service 

控制平面节点:解除节点保护

[root@k8s1 ~]# kubectl uncordon k8s1
node/k8s1 uncordoned

控制平面节点:验证节点状态

[root@k8s1 ~]# kubectl get node
NAME   STATUS   ROLES                  AGE   VERSION
k8s1   Ready    control-plane,master   2d    v1.23.1
k8s2   Ready    <none>                 2d    v1.22.2
k8s3   Ready    <none>                 2d    v1.22.2

工作节点升级:升级 kubeadm

[root@k8s2 ~]# yum install -y kubeadm-1.23.1-0

工作节点升级:执行 kubeadm upgrade node

[root@k8s2 ~]# kubeadm upgrade node
[upgrade] Reading configuration from the cluster...
[upgrade] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks
[preflight] Skipping prepull. Not a control plane node.
[upgrade] Skipping phase. Not a control plane node.
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[upgrade] The configuration for this node was successfully updated!
[upgrade] Now you should go ahead and upgrade the kubelet package using your package manager.

工作节点升级:腾空节点

[root@k8s1 ~]# kubectl drain k8s2 --ignore-daemonsets
node/k8s2 cordoned
WARNING: ignoring DaemonSet-managed Pods: kube-system/kube-flannel-ds-jmtfd, kube-system/kube-proxy-fhnvc
evicting pod kube-system/coredns-7b56f6bc55-dgh4p
pod/coredns-7b56f6bc55-dgh4p evicted
node/k8s2 drained

工作节点升级:升级 kubectl 和 kubelet

[root@k8s2 ~]# yum install -y kubectl-1.23.1-0 kubelet-1.23.1-0
[root@k8s2 ~]# systemctl daemon-reload 
[root@k8s2 ~]# systemctl restart kubelet.service 

工作节点升级:取消节点的保护

[root@k8s1 ~]# kubectl uncordon k8s2
node/k8s2 uncordoned
  • 第二个工作节点 k8s3 的升级过程不再赘述了,和 k8s2 的过程一样

kubeadm 集群:验证集群的状态

[root@k8s1 ~]# kubectl get node
NAME   STATUS   ROLES                  AGE   VERSION
k8s1   Ready    control-plane,master   2d    v1.23.1
k8s2   Ready    <none>                 2d    v1.23.1
k8s3   Ready    <none>                 2d    v1.23.1

3. 升级成功,进行 etcd 备份与恢复

3.1 获取 etcdctl 命令

  1. 打开docker systemctl start docker,交互式进入 3.5.1-0 版本的 etcd
[root@k8s1 ~]# docker run -it --rm reg.westos.org/k8s/etcd:3.5.1-0 sh
sh-5.1#                                                                                                           sh-5.1# 
  1. 打开一个新的终端
[root@k8s1 ~]# docker ps
CONTAINER ID   IMAGE                             COMMAND   CREATED          STATUS          PORTS                               NAMES
d6c7021afde3   reg.westos.org/k8s/etcd:3.5.1-0   "sh"      42 seconds ago   Up 40 seconds   2379-2380/tcp, 4001/tcp, 7001/tcp   crazy_goodall
  1. 拷贝得到 etcdctl 命令,之后使用的 etcdctl 命令都是这个
[root@k8s1 ~]# docker cp d6c7021afde3:/usr/local/bin/etcdctl .
[root@k8s1 ~]# ll etcdctl 
-r-xr-xr-x 1 root root 17981440 Nov  3 12:14 etcdctl
  1. 退出容器
[root@k8s1 ~]# docker run -it --rm reg.westos.org/k8s/etcd:3.5.1-0 sh
sh-5.1#                                                                                                           sh-5.1# ^C
sh-5.1# exit

3.2 创建 pod 资源

  1. pod 的名称是 test
[root@k8s1 ~]# kubectl run test --image=nginx
pod/test created
[root@k8s1 ~]# kubectl get pod 
NAME   READY   STATUS              RESTARTS   AGE
test   1/1     Running             0          94s

3.3 备份 etcd 数据

  1. 备份,生成快照文件
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl --endpoints 127.0.0.1:2379 --cert=/etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key --cacert=/etc/kubernetes/pki/etcd/ca.crt snapshot save /tmp/backup/`hostname`-etcd-`date +%Y%m%d%H%M`.db
Snapshot saved at /tmp/backup/k8s1-etcd-202204232220.db
  1. 查看备份状态
  • 注意,这次备份的数据中包含了创建的 pod 资源 test
[root@k8s1 ~]# ETCDCTL_API=3 etcdctl snapshot status /tmp/backup/k8s1-etcd-202204232220.db
f99920fc, 9452, 1431, 3.8 MB

3.4 删除 pod 资源

  • 我们要做的是备份包含 test 的 pod 资源。删除 pod后,通过单机恢复的方式将 test 的 pod 资源恢复
[root@k8s1 ~]# kubectl delete --force pod test
warning: Immediate deletion does not wait for confirmation that the running resource has been terminated. The resource may continue to run on the cluster indefinitely.
pod "test" force deleted

3.5 恢复数据

3.5.1 关闭所有的组件

  1. /etc/kubernetes/ 下的 manifest 目录进行备份,kubernetes 就会找不到配置数据,会自动宕掉
[root@k8s1 ~]# cd /etc/kubernetes/
[root@k8s1 kubernetes]# ls
admin.conf  controller-manager.conf  kubelet.conf  manifests  pki  scheduler.conf  tmp
[root@k8s1 kubernetes]# mv manifests/ manifests.bak
[root@k8s1 kubernetes]# ls
admin.conf  controller-manager.conf  kubelet.conf  manifests.bak  pki  scheduler.conf  tmp
  1. 通过 crictl 命令查看到部分组件服务宕掉了
[root@k8s1 kubernetes]# crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                ATTEMPT             POD ID
00a2c92eab979       a4ca41631cc7a       35 minutes ago      Running             coredns             0                   39c40f3e94642
3dd14316876bf       b46c42588d511       42 minutes ago      Running             kube-proxy          0                   d7f3f39d341b0
a353f5362f207       9247abf086779       About an hour ago   Running             kube-flannel        1                   f060899b44344

3.5.2 删除当前 etcd 的数据目录

  1. 将当前 etcd 的数据目录 /var/lib/etcd 进行备份(可以理解为删除)
[root@k8s1 kubernetes]# cd /var/lib/
[root@k8s1 lib]# mv etcd etcd.bak

3.5.3 开始恢复数据

  1. 需要的话,可以执行数据目录的位置
[root@k8s1 lib]# ETCDCTL_API=3 etcdctl snapshot restore /tmp/backup/k8s1-etcd-202204232220.db --data-dir=/var/lib/etcd
2022-04-23 22:28:22.580862 I | mvcc: restore compact to 8778
2022-04-23 22:28:22.632531 I | etcdserver/membership: added member 8e9e05c52164694d [http://localhost:2380] to cluster cdf818194e3a8c32

3.5.4 运行所有的组件

  1. 回到 kubernetes 的配置目录下,恢复 manifest 目录
[root@k8s1 lib]# cd /etc/kubernetes/
[root@k8s1 kubernetes]# mv manifests.bak manifests
  1. 重启守护进程 kubelet
[root@k8s1 kubernetes]# systemctl restart kubelet.service 

3.6 验证集群的状态

  1. 检查 kubernetes 的服务是否恢复正常
[root@k8s1 kubernetes]# crictl ps
CONTAINER           IMAGE               CREATED             STATE               NAME                      ATTEMPT             POD ID
d73374d629173       b6d7abedde399       30 seconds ago      Running             kube-apiserver            0                   a6ce154f8ed39
9644257cc5b79       71d575efe6283       30 seconds ago      Running             kube-scheduler            0                   69a86c32d82c2
9d2c7fdfae301       25f8c7f3da61c       30 seconds ago      Running             etcd                      0                   a156d32b4bfca
89501643a9ba6       f51846a4fd288       30 seconds ago      Running             kube-controller-manager   0                   e2f0acade1880
00a2c92eab979       a4ca41631cc7a       38 minutes ago      Running             coredns                   0                   39c40f3e94642
3dd14316876bf       b46c42588d511       46 minutes ago      Running             kube-proxy                0                   d7f3f39d341b0
a353f5362f207       9247abf086779       About an hour ago   Running             kube-flannel              1                   f060899b44344
  1. 之前创建的 pod 资源 test 被恢复
[root@k8s1 kubernetes]# kubectl get pod
NAME   READY   STATUS    RESTARTS   AGE
test   1/1     Running   0          19m

4. 恢复成功!!!

tips:
第二个问题 单机恢复
【【【退出ssh的master节点,进入base环境】】】
(就2个命令)
先备份
ETCDCTL 3 save 路径tmp(路径不要手敲,复制网页的,浪费时间)
cd var lib
将etcd.bak 恢复成etcd

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐