rook-ceph的使用
为什么要是使用rook部署ceph?从官网可以得知ceph官方推荐在k8s集群中使用rook部署和管理ceph集群。所以本篇讲解如何在k8s中使用rook部署ceph集群。rook官网:https://rook.io/官网介绍,rook是开源的,Kubernetes的云原生存储。rook是Kubernetes的存储操作员,Rook将分布式存储系统转变为自管理、自扩展、自修复的存储服务。它自动化了存
目录
前言
环境:centos 7.9 k8s 1.22.17
为什么要是使用rook部署ceph?从官网https://docs.ceph.com/en/latest/install/#recommended-methods
可以得知ceph官方推荐在k8s集群中使用rook部署和管理ceph集群。所以本篇讲解如何在k8s中使用rook部署ceph集群。
什么是rook
rook官网:https://rook.io/
官网介绍,rook是开源的,Kubernetes的云原生存储。
rook是Kubernetes的存储操作员,Rook将分布式存储系统转变为自管理、自扩展、自修复的存储服务。它自动化了存储管理员的任务:部署、引导、配
置、发放、扩展、升级、迁移、灾难恢复、监控和资源管理。
Rook使用Kubernetes平台的强大功能,通过Kubernetes Operator为Ceph提供服务。
rook是Ceph存储提供商,Rook编排Ceph存储解决方案,使用专门的Kubernetes Operator实现自动化管理。Rook确保Ceph在Kubernetes上运行良好,
并简化部署和管理。
Rook是一个开源的云原生存储编排器,为Ceph存储提供平台、框架和支持,以便与云原生环境进行本地集成。
rook既可以k8s集群里创建一个ceph集群,rook还可以配置链接外部的ceph集群供k8s使用,本篇使用rook在k8s集群内部创建ceph集群。
先决条件
1、安装依赖
#k8s所有节点安装lvm,因为OSD pods运行的节点需要
sudo yum install -y lvm2
2、确认内核是否支持rbd
#确认内核是否支持rbd
lsmod | grep rbd #有输出说明支持
rbd 83640 0
libceph 306625 1 rbd
#如果没有输出,则临时载入rbd模块,重启失效
modprobe rbd #内核加载rbd
#写入配置文件,开机自动加载,
[root@master ~]# vim /etc/modules-load.d/rbd.conf #文件名任意,以.conf结尾即可
rbd #内容写rbd即可
[root@master ~]#
3、内核要求
如果从CephFS创建RWX卷,推荐的最低内核版本是4.17。如果内核版本低于4.17,则不会强制执行所请求的PVC大小。存储配额只在较新的内核上强制执行。
4、k8s版本要求
Kubernetes v1.21 or higher is supported by Rook.
5、服务器架构要求
Architectures supported are amd64 / x86_64 and arm64.
6、存储要求,满足一种即可
原始设备(没有分区或格式化的文件系统)
原始分区(没有格式化的文件系统)
LVM逻辑卷(没有格式化的文件系统)
块模式下存储类中可用的持久卷
使用命令lsblk -f查看,如果FSTYPE字段不为空,则在相应的设备上没有文件系统,则rook可以使用该设备。
开始部署rook
#官方文档https://rook.io/docs/rook/v1.11/Getting-Started/quickstart/#tldr
#在k8smaster节点上克隆rook的源码
mkdir ~/rook-ceph
cd ~/rook-ceph
yum install git -y
git clone --single-branch --branch v1.11.7 https://github.com/rook/rook.git
cd rook/deploy #deploy目录下有两种安装方式
[root@master deploy]# ll
drwxr-xr-x 5 root root 63 Jun 13 17:44 charts #helm安装的ceph
drwxr-xr-x 4 root root 4096 Jun 13 17:44 examples #yaml文件手动安装的rook-ceph
drwxr-xr-x 3 root root 22 Jun 13 17:44 olm
#本次采用yaml文件手动安装的rook-ceph
[root@master deploy]# cd examples
[root@master examples]# ls crds.yaml -f common.yaml -f operator.yaml cluster.yaml images.txt
crds.yaml common.yaml operator.yaml cluster.yaml
[root@master examples]#
#从上面的名字可以得知,crd.yaml就是自定义的crd资源清单,common.yaml定义了各种各样的rbac相关的role、rolebinding和serviceaccount
#operator.yaml文件里面用于创建一个configmap和deployment,该deployment就是Rook Ceph Operator
#cluster.yaml创建了kind: CephCluster的自定义资源,这个资源其实会安装Ceph daemon pods (mon, mgr, osd, mds, rgw)
#images.txt 镜像文件,可以知道需要哪些镜像,可以提前下载好镜像
#开始安装rook-ceph,安装顺序crds.yaml、common.yaml、operator.yaml、cluster.yaml
[root@master examples]# kubectl create -f crds.yaml -f common.yaml
[root@master examples]# vim operator.yaml #可以先看下operator.yaml,该文件里面用于创建一个configmap和deployment
ROOK_CSI_KUBELET_DIR_PATH: "/var/lib/kubelet" #这个变量配置的是kubelet的路径,如果你的kubelet安装在其它目录请更改该参数
ROOK_ENABLE_DISCOVERY_DAEMON: "true" #是否启动discovery守护进程监视集群内节点上的裸存储设备,默认是false,这里设为true
#如果你只打算基于StorageClassDeviceSets和pvc创建你的osd,这个守护进程不需要运行
operator.yaml文件configmap里面有一些注释掉的默认的镜像可能下载不了,找了个别人的镜像仓库,使用下面的方法解决:
docker pull registry.aliyuncs.com/it00021hot/csi-attacher:v4.1.0
docker tag registry.aliyuncs.com/it00021hot/csi-attacher:v4.1.0 registry.k8s.io/sig-storage/csi-attacher:v4.1.0
docker rmi registry.aliyuncs.com/it00021hot/csi-attacher:v4.1.0
docker pull registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.7.0
docker tag registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.7.0 registry.k8s.io/sig-storage/csi-node-driver-registrar:v2.7.0
docker rmi registry.aliyuncs.com/it00021hot/csi-node-driver-registrar:v2.7.0
docker pull registry.aliyuncs.com/it00021hot/csi-resizer:v1.7.0
docker tag registry.aliyuncs.com/it00021hot/csi-resizer:v1.7.0 registry.k8s.io/sig-storage/csi-resizer:v1.7.0
docker rmi registry.aliyuncs.com/it00021hot/csi-resizer:v1.7.0
docker pull registry.aliyuncs.com/it00021hot/csi-provisioner:v3.4.0
docker tag registry.aliyuncs.com/it00021hot/csi-provisioner:v3.4.0 registry.k8s.io/sig-storage/csi-provisioner:v3.4.0
docker rmi registry.aliyuncs.com/it00021hot/csi-provisioner:v3.4.0
docker pull registry.aliyuncs.com/it00021hot/csi-snapshotter:v6.2.1
docker tag registry.aliyuncs.com/it00021hot/csi-snapshotter:v6.2.1 registry.k8s.io/sig-storage/csi-snapshotter:v6.2.1
docker rmi registry.aliyuncs.com/it00021hot/csi-snapshotter:v6.2.1
[root@master examples]# kubectl create -f operator.yaml #安装operator.yaml
cluster.yaml文件讲解
[root@master examples]# grep -Ev '^$|#' cluster.yaml
apiVersion: ceph.rook.io/v1
kind: CephCluster
metadata:
name: rook-ceph
namespace: rook-ceph # namespace:cluster
spec:
cephVersion:
image: quay.io/ceph/ceph:v17.2.6
allowUnsupported: false
dataDirHostPath: /var/lib/rook # 配置文件将被持久化的主机路径,必须指定.
skipUpgradeChecks: false
continueUpgradeAfterChecksEvenIfNotHealthy: false
waitTimeoutForHealthyOSDInMinutes: 10
mon:
count: 3 #mon的个数,默认3个,对于生产环境,建议至少配置3个mon,应该指定奇数个mon
allowMultiplePerNode: false #是否允许多个mon在一个主机上,默认false不允许
mgr:
count: 2 #mgr的个数,默认2个,当需要mgr高可用性时,应当设置为2,实现一主一备
allowMultiplePerNode: false #是否允许多个mon在一个主机上,默认false不允许
modules:
- name: pg_autoscaler
enabled: true
dashboard:
enabled: true #ceph的dashboard面板,默认是开启的
ssl: true
monitoring:
enabled: false
metricsDisabled: false
network:
connections:
encryption:
enabled: false
compression:
enabled: false
requireMsgr2: false
crashCollector:
disable: false
logCollector:
enabled: true
periodicity: daily # one of: hourly, daily, weekly, monthly
maxLogSize: 500M # SUFFIX may be 'M' or 'G'. Must be at least 1M.
cleanupPolicy:
confirmation: ""
sanitizeDisks:
method: quick
dataSource: zero
iteration: 1
allowUninstallWithVolumes: false
annotations:
labels:
resources:
removeOSDsIfOutAndSafeToRemove: false
priorityClassNames:
mon: system-node-critical
osd: system-node-critical
mgr: system-cluster-critical
storage: #集群级存储配置和选则,即选择哪些磁盘作为OSD存储
useAllNodes: true #默认使用所有节点,如果有节点规划,则设为false
useAllDevices: true #默认使用所有没有文件系统的分区,如果有节点磁盘规划,则设为false
config:
onlyApplyOSDPlacement: false
disruptionManagement:
managePodBudgets: true
osdMaintenanceTimeout: 30
pgHealthCheckTimeout: 0
healthCheck:
daemonHealth:
mon:
disabled: false
interval: 45s
osd:
disabled: false
interval: 60s
status:
disabled: false
interval: 60s
livenessProbe:
mon:
disabled: false
mgr:
disabled: false
osd:
disabled: false
startupProbe:
mon:
disabled: false
mgr:
disabled: false
osd:
disabled: false
[root@master examples]#
[root@master examples]# vim cluster.yaml #查看cluster.yaml,关注下下面这个参数
#重要提示:如果您重新安装集群,请确保在每个主机上删除此目录,否则在新集群上启动mons将失败。
#在Minikube中,'/data'目录被配置为在重启时持久化。在Minikube环境中使用“/data/rook”
dataDirHostPath: /var/lib/rook #保持默认,不用改
[root@master examples]# kubectl create -f cluster.yaml #安装cluster.yaml
查看cephcluster
[root@ceph-1 examples]# kubectl -n rook-ceph get cephcluster
NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL FSID
rook-ceph /var/lib/rook 3 2d1h Ready Cluster created successfully HEALTH_OK 25f12c96-29b0-4487-8e2d-ec24525e33f9
[root@ceph-1 examples]#
查看pod
[root@master examples]# kubectl get all -n rook-ceph
NAME READY STATUS RESTARTS AGE
pod/csi-cephfsplugin-c4dt7 2/2 Running 0 15m
pod/csi-cephfsplugin-ddk4x 2/2 Running 0 15m
pod/csi-cephfsplugin-nv2m8 2/2 Running 0 3m42s
pod/csi-cephfsplugin-provisioner-58948fc785-jbmxs 5/5 Running 0 4m20s
pod/csi-cephfsplugin-provisioner-58948fc785-sr9r6 5/5 Running 0 15m
pod/csi-rbdplugin-ht49j 2/2 Running 0 15m
pod/csi-rbdplugin-k5lwp 2/2 Running 0 15m
pod/csi-rbdplugin-provisioner-84bfcb8bfc-8xb4f 5/5 Running 0 15m
pod/csi-rbdplugin-provisioner-84bfcb8bfc-m8kgp 5/5 Running 0 15m
pod/csi-rbdplugin-x9p4g 2/2 Running 0 3m42s
pod/rook-ceph-crashcollector-ceph-1-6f67c84f4b-p2rxt 1/1 Running 0 15m
pod/rook-ceph-crashcollector-ceph-2-76d94b6769-x2q95 1/1 Running 0 13m
pod/rook-ceph-crashcollector-ceph-3-5ffd7fc4d-4tkfc 1/1 Running 0 13m
pod/rook-ceph-mgr-a-745c8fb9c-8lxkm 3/3 Running 0 15m
pod/rook-ceph-mgr-b-77bdd5584b-wqb6w 3/3 Running 0 15m
pod/rook-ceph-mon-a-5bdf5ccd9-ptzhl 2/2 Running 0 17m
pod/rook-ceph-mon-b-577d5bfcd7-22vtt 2/2 Running 0 15m
pod/rook-ceph-mon-c-84bccb66f5-k9lzj 2/2 Running 0 15m
pod/rook-ceph-operator-58d8b7b5df-8blw4 1/1 Running 0 166m
pod/rook-ceph-osd-0-844ffbd8d7-9cclh 2/2 Running 0 13m
pod/rook-ceph-osd-1-6ccd9cc645-jvqjh 2/2 Running 0 13m
pod/rook-ceph-osd-2-54866f6f46-c7vpx 2/2 Running 0 13m
pod/rook-ceph-osd-prepare-ceph-1-w5cnd 0/1 Completed 0 13m
pod/rook-ceph-osd-prepare-ceph-2-rxml8 0/1 Completed 0 13m
pod/rook-ceph-osd-prepare-ceph-3-dkq98 0/1 Completed 0 12m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/rook-ceph-mgr ClusterIP 10.233.42.104 <none> 9283/TCP 14m
service/rook-ceph-mgr-dashboard ClusterIP 10.233.36.230 <none> 8443/TCP 14m
service/rook-ceph-mon-a ClusterIP 10.233.3.156 <none> 6789/TCP,3300/TCP 17m
service/rook-ceph-mon-b ClusterIP 10.233.62.240 <none> 6789/TCP,3300/TCP 15m
service/rook-ceph-mon-c ClusterIP 10.233.34.110 <none> 6789/TCP,3300/TCP 15m
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/csi-cephfsplugin 3 3 3 3 3 <none> 15m
daemonset.apps/csi-rbdplugin 3 3 3 3 3 <none> 15m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/csi-cephfsplugin-provisioner 2/2 2 2 15m
deployment.apps/csi-rbdplugin-provisioner 2/2 2 2 15m
deployment.apps/rook-ceph-crashcollector-ceph-1 1/1 1 1 15m
deployment.apps/rook-ceph-crashcollector-ceph-2 1/1 1 1 15m
deployment.apps/rook-ceph-crashcollector-ceph-3 1/1 1 1 15m
deployment.apps/rook-ceph-mgr-a 1/1 1 1 15m
deployment.apps/rook-ceph-mgr-b 1/1 1 1 15m
deployment.apps/rook-ceph-mon-a 1/1 1 1 17m
deployment.apps/rook-ceph-mon-b 1/1 1 1 15m
deployment.apps/rook-ceph-mon-c 1/1 1 1 15m
deployment.apps/rook-ceph-operator 1/1 1 1 3h54m
deployment.apps/rook-ceph-osd-0 1/1 1 1 13m
deployment.apps/rook-ceph-osd-1 1/1 1 1 13m
deployment.apps/rook-ceph-osd-2 1/1 1 1 13m
NAME DESIRED CURRENT READY AGE
replicaset.apps/csi-cephfsplugin-provisioner-58948fc785 2 2 2 15m
replicaset.apps/csi-rbdplugin-provisioner-84bfcb8bfc 2 2 2 15m
replicaset.apps/rook-ceph-crashcollector-ceph-1-6f67c84f4b 1 1 1 15m
replicaset.apps/rook-ceph-crashcollector-ceph-2-76d94b6769 1 1 1 13m
replicaset.apps/rook-ceph-crashcollector-ceph-2-fd87584cf 0 0 0 15m
replicaset.apps/rook-ceph-crashcollector-ceph-3-5ffd7fc4d 1 1 1 13m
replicaset.apps/rook-ceph-crashcollector-ceph-3-6955f6c55d 0 0 0 15m
replicaset.apps/rook-ceph-mgr-a-745c8fb9c 1 1 1 15m
replicaset.apps/rook-ceph-mgr-b-77bdd5584b 1 1 1 15m
replicaset.apps/rook-ceph-mon-a-5bdf5ccd9 1 1 1 17m
replicaset.apps/rook-ceph-mon-b-577d5bfcd7 1 1 1 15m
replicaset.apps/rook-ceph-mon-c-84bccb66f5 1 1 1 15m
replicaset.apps/rook-ceph-operator-58d8b7b5df 1 1 1 3h54m
replicaset.apps/rook-ceph-osd-0-844ffbd8d7 1 1 1 13m
replicaset.apps/rook-ceph-osd-1-6ccd9cc645 1 1 1 13m
replicaset.apps/rook-ceph-osd-2-54866f6f46 1 1 1 13m
NAME COMPLETIONS DURATION AGE
job.batch/rook-ceph-osd-prepare-ceph-1 1/1 2s 13m
job.batch/rook-ceph-osd-prepare-ceph-2 1/1 4s 13m
job.batch/rook-ceph-osd-prepare-ceph-3 1/1 7s 12m
[root@master examples]#
安装ceph-toolbox客户端工具
以上我们已经搭建好了ceph集群,那么如何验证ceph集群是否健康的呢?官方提供了ceph-toolbox这个容器客户端工具来使用ceph相关命令操作ceph集群,ceph-toolbox也是一个容器,按照官网步骤创建ceph-toolbox即可:
ceph-toolbox官网:https://rook.io/docs/rook/v1.11/Troubleshooting/ceph-toolbox/
官网说toolbox有两种运行模式:
交互式:启动一个toolbox pod,您可以在其中连接并从shell执行Ceph命令;
一次性作业:使用Ceph命令运行脚本,并从作业日志中收集结果。
我们这里使用交互式,即启动一个toolbox pod,然后在其中使用shell执行Ceph命令。
toolbox的yaml文件就在源码包里面,直接安装即可。
[root@master examples]# ll toolbox.yaml toolbox-job.yaml
-rw-r--r-- 1 root root 1868 Jul 11 17:31 toolbox-job.yaml #一次性作业
-rw-r--r-- 1 root root 4220 Jul 11 17:31 toolbox.yaml #交互式
[root@master examples]#
[root@master examples]# kubectl apply -f toolbox.yaml
deployment.apps/rook-ceph-tools created
[root@master examples]#
[root@master examples]# kubectl get pod -n rook-ceph
NAME READY STATUS RESTARTS AGE
rook-ceph-tools-657868c8cf-sz5b7 1/1 Running 0 25s
[root@master examples]# kubectl -n rook-ceph exec -it rook-ceph-tools-657868c8cf-sz5b7 -- bash
bash-4.4$ ceph status #查看ceph集群状态
cluster:
id: 25f12c96-29b0-4487-8e2d-ec24525e33f9
health: HEALTH_OK
services:
mon: 3 daemons, quorum a,b,c (age 54m)
mgr: a(active, since 51m), standbys: b
osd: 3 osds: 3 up (since 51m), 3 in (since 52m)
data:
pools: 1 pools, 1 pgs
objects: 2 objects, 449 KiB
usage: 62 MiB used, 147 GiB / 147 GiB avail
pgs: 1 active+clean
bash-4.4$ ceph fsid #查看ceph集群cluster ID
25f12c96-29b0-4487-8e2d-ec24525e33f9
bash-4.4$
bash-4.4$ cat /etc/ceph/ceph.conf #可以看到,容器里面其实保存了ceph.conf文件
[global]
mon_host = 10.233.34.110:6789,10.233.3.156:6789,10.233.62.240:6789 #3个mon的svc IP
[client.admin]
keyring = /etc/ceph/keyring
bash-4.4$ cat /etc/ceph/keyring #可以看到,容器里面其实保存了keyring文件
[client.admin]
key = AQBuZa1kTXIsLBAATbTu19OPcVATJX4rgnJFCQ==
bash-4.4$ ceph auth get-key client.admin #这条命令也可以看到keyring
AQBuZa1kTXIsLBAATbTu19OPcVATJX4rgnJFCQ==
bash-4.4$
启用ceph-dashboard
官网:https://rook.io/docs/rook/v1.11/Storage-Configuration/Monitoring/ceph-dashboard/#enable-the-ceph-dashboard
#在安装cluster.yaml是我们启用了ceph的dashboard面板
[root@ceph-1 examples]# grep -B9 -i dashboard cluster.yaml
# enable the ceph dashboard for viewing cluster status
dashboard:
enabled: true
# serve the dashboard under a subpath (useful when you are accessing the dashboard via a reverse proxy)
# urlPrefix: /ceph-dashboard
# serve the dashboard at the given port.
# port: 8443
# serve the dashboard using SSL
[root@ceph-1 examples]# kubectl -n rook-ceph get service #查看svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr ClusterIP 10.233.42.104 <none> 9283/TCP 43h
rook-ceph-mgr-dashboard ClusterIP 10.233.36.230 <none> 8443/TCP 43h
..........................
说明:
rook-ceph-mgr这个service是报告Prometheus metrics的;
rook-ceph-mgr-dashboard这个service就是ceph的dashboard面板
[root@ceph-1 examples]#
Rook为dashboard创建了一个默认用户,名称叫做admin,该用户密码保存在rook-ceph-dashboard-password这个secret里面
可以这样解密:kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{.data.password}" | base64 --decode && echo
访问dashboard
由于rook-ceph-mgr-dashboard svc默认是ClusterIP类型的,所以外部浏览器无法访问,尝试编辑rook-ceph-mgr-dashboard 这个svc为NodePort类型,即使编辑时保存成功,但是发现它会自动变成ClusterIP类型,即自动还原了。所以这种方法行不通。
根据官网介绍,可以有多种方式实现,NodePort是最简单的一种,官网提供了对应的svc,就在源码里面:
[root@ceph-1 examples]# ll dashboard-external-https.yaml
-rw-r--r-- 1 root root 432 Jul 11 17:31 dashboard-external-https.yaml
[root@ceph-1 examples]# kubectl apply -f dashboard-external-https.yaml
[root@ceph-1 examples]# kubectl get svc -n rook-ceph
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
rook-ceph-mgr ClusterIP 10.233.42.104 <none> 9283/TCP 44h
rook-ceph-mgr-dashboard ClusterIP 10.233.36.230 <none> 8443/TCP 44h
rook-ceph-mgr-dashboard-external-https NodePort 10.233.63.61 <none> 8443:17468/TCP 8s
[root@ceph-1 examples]#
浏览器访问:
https://192.168.244.6:17468/
注意是https访问,不是http,账号admin,密码保存在rook-ceph-dashboard-password这个secret里面
删除rook-ceph集群
官网清除集群说明:https://rook.io/docs/rook/v1.11/Getting-Started/ceph-teardown/
#如果yaml文件还存在,则可以根据yaml删除资源
kubectl delete -f cluster.yaml
kubectl delete -f common.yaml
kubectl delete -f operator.yaml
rm -rf /var/lib/rook #删除每台宿主机上的ceph默认配置目录
更多推荐
所有评论(0)