k8s中利用rook安装ceph集群
利用rook 1.6在kubernetes中安装ceph集群
·
k8s中利用rook 1.6安装ceph集群
环境准备
系统版本(使用sealos构建k8s集群)
k8s: 1.19.6
rook: 1.6
ubuntu: 1.20
安装条件
- 您已经安装了 Kubernetes 集群,且集群版本不低于 v1.17.0,安装方法请参考 安装 Kubernetes 集群;
- Kubernetes 集群有至少 3 个工作节点,且每个工作节点都有一块初系统盘以外的 未格式化 的裸盘(工作节点是虚拟机时,未格式化的裸盘可以是虚拟磁盘),用于创建 3 个 Ceph OSD;
- 也可以只有 1 个工作节点,挂载了一块 未格式化 的裸盘;
- 在节点机器上执行
lsblk -f
指令可以查看磁盘是否需被格式化,输出结果如下:
NAME FSTYPE LABEL UUID MOUNTPOINT
vda
└─vda1 LVM2_member >eSO50t-GkUV-YKTH-WsGq-hNJY-eKNf-3i07IB
├─ubuntu--vg-root ext4 c2366f76-6e21-4f10-a8f3-6776212e2fe4 /
└─ubuntu--vg-swap_1 swap 9492a3dc-ad75-47cd-9596-678e8cf17ff9 [SWAP]
vdb
注:如果以前安装过ceph,那么现在重新安装,可能需要将挂载的硬盘重新格式化挂载,请参考这篇文章:Rook Ceph OSD异常,格式化osd硬盘重新挂载
安装过程
- 安装rook,rbd和cephfs的区别请看官方文档:ceph-csi-drivers
# 下载源码
$ git clone --single-branch --branch v1.6.0 https://github.com/rook/rook.git
$ cd rook/cluster/examples/kubernetes/ceph
$ kubectl create -f crds.yaml -f common.yaml -f operator.yaml
# 安装集群
$ kubectl create -f cluster.yaml
# 安装toolbox工具
$ kubectl create -f toolbox.yaml
# 安装dashboard
$ kubectl create -f dashboard-external-https.yaml
# 创建cephfs类型的storageclass
$ k apply -f ./csi/cephfs/storageclass.yaml
- 验证安装
# 等一会儿,会发现所有的pod都启动了
$ watch kubectl get pods -n rook-ceph
NAME READY STATUS RESTARTS AGE
csi-cephfsplugin-9phgd 3/3 Running 0 19m
csi-cephfsplugin-b5zm7 3/3 Running 0 19m
csi-cephfsplugin-mvx7b 3/3 Running 0 19m
csi-cephfsplugin-ngkpt 3/3 Running 0 19m
csi-cephfsplugin-provisioner-db45f85f5-77658 6/6 Running 0 19m
csi-cephfsplugin-provisioner-db45f85f5-88xm9 6/6 Running 0 19m
csi-cephfsplugin-xvrfz 3/3 Running 0 19m
csi-rbdplugin-59f94 3/3 Running 0 19m
csi-rbdplugin-76g7n 3/3 Running 0 19m
csi-rbdplugin-p4twb 3/3 Running 0 19m
csi-rbdplugin-pjsw9 3/3 Running 0 19m
csi-rbdplugin-provisioner-d85cbdb48-cm8zb 6/6 Running 0 19m
csi-rbdplugin-provisioner-d85cbdb48-xg7ph 6/6 Running 0 19m
csi-rbdplugin-tj2vr 3/3 Running 0 19m
rook-ceph-crashcollector-gpu-1-5b68d5cd59-mtn7s 1/1 Running 0 19m
rook-ceph-crashcollector-gpu-2-868f498db-p4cg5 1/1 Running 0 16m
rook-ceph-crashcollector-gpu-3-6959b695d5-ns6hf 1/1 Running 0 16m
rook-ceph-crashcollector-worker01-6446f7c66d-4dgvr 1/1 Running 0 18m
rook-ceph-crashcollector-worker02-7c5dbc645-q4xsh 1/1 Running 0 19m
rook-ceph-mgr-a-56dc6bd5dd-ss6d5 1/1 Running 0 18m
rook-ceph-mon-a-7cf96d4f9f-9l5kq 1/1 Running 0 19m
rook-ceph-mon-b-7c4d4c48c6-g9z2w 1/1 Running 0 19m
rook-ceph-mon-c-7dc65846d-hmn6b 1/1 Running 0 19m
rook-ceph-operator-54cf7487d4-9zhn6 1/1 Running 0 20m
rook-ceph-osd-0-765fbb9f79-66v97 1/1 Running 0 18m
rook-ceph-osd-1-d566bfc77-w5xpd 1/1 Running 1 16m
rook-ceph-osd-2-76754d4875-lzqrz 1/1 Running 0 16m
rook-ceph-osd-prepare-gpu-1-c6zgm 0/1 Completed 0 15m
rook-ceph-osd-prepare-gpu-2-r8m9r 0/1 Completed 0 15m
rook-ceph-osd-prepare-gpu-3-l8skf 0/1 Completed 0 15m
rook-ceph-osd-prepare-worker01-nq4g2 0/1 Completed 0 15m
rook-ceph-osd-prepare-worker02-d4qrw 0/1 Completed 0 15m
# 查看 ceph 状态
$ kubectl -n rook-ceph exec -it rook-ceph-tools-76c7d559b6-8w7bk -- sh -c 'ceph status'
cluster:
id: 5db57586-6d6f-4529-a956-b41242046ff2
health: HEALTH_WARN
clock skew detected on mon.b, mon.c
mon c is low on available space
services:
mon: 3 daemons, quorum a,b,c (age 30m)
mgr: a(active, since 26m)
osd: 3 osds: 3 up (since 27m), 3 in (since 27m)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 87 GiB / 90 GiB avail
pgs: 1 active+clean
# 查看dashboard
$ kubectl -n rook-ceph get service
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
csi-cephfsplugin-metrics ClusterIP 10.106.161.37 <none> 8080/TCP,8081/TCP 3h13m
csi-rbdplugin-metrics ClusterIP 10.106.22.108 <none> 8080/TCP,8081/TCP 3h13m
rook-ceph-mgr ClusterIP 10.99.57.141 <none> 9283/TCP 3h12m
rook-ceph-mgr-dashboard ClusterIP 10.109.130.98 <none> 8443/TCP 3h12m
rook-ceph-mgr-dashboard-external-http NodePort 10.98.243.88 <none> 7000:30574/TCP 9m49s
rook-ceph-mgr-dashboard-external-https NodePort 10.96.251.99 <none> 8443:32066/TCP 5s
rook-ceph-mon-a ClusterIP 10.100.24.39 <none> 6789/TCP,3300/TCP 3h13m
rook-ceph-mon-b ClusterIP 10.107.108.211 <none> 6789/TCP,3300/TCP 3h13m
rook-ceph-mon-c ClusterIP 10.96.149.72 <none> 6789/TCP,3300/TCP 3h12m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
csi-cephfsplugin-metrics ClusterIP 10.106.161.37 <none> 8080/TCP,8081/TCP 3h13m
csi-rbdplugin-metrics ClusterIP 10.106.22.108 <none> 8080/TCP,8081/TCP 3h13m
rook-ceph-mgr ClusterIP 10.99.57.141 <none> 9283/TCP 3h12m
rook-ceph-mgr-dashboard ClusterIP 10.109.130.98 <none> 8443/TCP 3h12m
rook-ceph-mgr-dashboard-external-http NodePort 10.98.243.88 <none> 7000:30574/TCP 9m49s
rook-ceph-mgr-dashboard-external-https NodePort 10.96.251.99 <none> 8443:32066/TCP 5s
rook-ceph-mon-a ClusterIP 10.100.24.39 <none> 6789/TCP,3300/TCP 3h13m
rook-ceph-mon-b ClusterIP 10.107.108.211 <none> 6789/TCP,3300/TCP 3h13m
rook-ceph-mon-c ClusterIP 10.96.149.72 <none> 6789/TCP,3300/TCP 3h12m
# 获取dashboard的密码,用户名是admin
$ kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo
FAQ
- crashcollector pod启动报错:
Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[default-token-vttr8 rook-ceph-crash-collector-keyring rook-config-override rook-ceph-log rook-ceph-crash]: timed out waiting for the condition
A:按照如下步骤操作:
$ kubectl delete -f cluster.yaml
$ kubectl delete -f operator.yaml -f common.yaml -f crds.yaml
# 集群中的每台机器上面执行命令
$ rm -rf /var/lib/rook /var/lib/kubelet/plugins_registry/* /var/lib/kubelet/plugins/
# 重新部署cluster
$ kubectl apply -f cluster.yaml
如果还是不行,就走开篇的重装ceph的步骤吧
- 执行ceph status命令的时候报warn:
clock skew detected on mon.b, mon.c
A: 这个问题是因为mon.b,mon.c服务所在node的时间差距超过了限制,参考clockdiff-检测两台linux主机的时间差
# 首先确定mon.b, mon.c服务所在的节点
$ k get pod rook-ceph-mon-b-7c4d4c48c6-g9z2w -n rook-ceph -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-mon-b-7c4d4c48c6-g9z2w 1/1 Running 0 98m 100.103.88.144 worker02 <none> <none>
# 查看worker02的节点时间和master机器的时间差
$ apt install iputils-clockdiff
$ clockdiff 10.20.17.193
.
host=10.20.17.193 rtt=750(187)ms/0ms delta=0ms/0ms Thu Apr 22 20:44:20 2021
# 去每台机器上面检查ntp服务是否启动
$ systemctl status ntp
● ntp.service - Network Time Service
Loaded: loaded (/lib/systemd/system/ntp.service; enabled; vendor preset: enabled)
Active: active (running) since Thu 2021-04-22 20:26:46 CST; 19min ago
Docs: man:ntpd(8)
Main PID: 32140 (ntpd)
Tasks: 2 (limit: 4915)
CGroup: /system.slice/ntp.service
└─32140 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 111:115
Apr 22 20:37:03 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:37:05 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:37:07 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:37:07 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:39:22 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:39:43 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:39:54 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:40:14 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:40:24 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:40:25 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
# 启动ntp服务
$ systemctl status ntp
# 开机自启动ntp服务
$ systemctl enable ntp
- 执行ceph status命令的时候报warn:
mon c is low on available space
A: 这个问题是因为mon.c服务所在宿主机节点的磁盘空间不足,注意不是挂载的ceph磁盘不足。
更多推荐
已为社区贡献2条内容
所有评论(0)