k8s中利用rook 1.6安装ceph集群

环境准备

系统版本(使用sealos构建k8s集群)

k8s: 1.19.6
rook: 1.6
ubuntu: 1.20

安装条件

  • 您已经安装了 Kubernetes 集群,且集群版本不低于 v1.17.0,安装方法请参考 安装 Kubernetes 集群;
    • Kubernetes 集群有至少 3 个工作节点,且每个工作节点都有一块初系统盘以外的 未格式化 的裸盘(工作节点是虚拟机时,未格式化的裸盘可以是虚拟磁盘),用于创建 3 个 Ceph OSD;
    • 也可以只有 1 个工作节点,挂载了一块 未格式化 的裸盘;
    • 在节点机器上执行 lsblk -f 指令可以查看磁盘是否需被格式化,输出结果如下:
NAME                  FSTYPE      LABEL UUID                                   MOUNTPOINT
vda
└─vda1                LVM2_member       >eSO50t-GkUV-YKTH-WsGq-hNJY-eKNf-3i07IB
 ├─ubuntu--vg-root   ext4              c2366f76-6e21-4f10-a8f3-6776212e2fe4   /
 └─ubuntu--vg-swap_1 swap              9492a3dc-ad75-47cd-9596-678e8cf17ff9   [SWAP]
vdb

注:如果以前安装过ceph,那么现在重新安装,可能需要将挂载的硬盘重新格式化挂载,请参考这篇文章:Rook Ceph OSD异常,格式化osd硬盘重新挂载


安装过程

  1. 安装rook,rbd和cephfs的区别请看官方文档:ceph-csi-drivers
# 下载源码
$ git clone --single-branch --branch v1.6.0 https://github.com/rook/rook.git
$ cd rook/cluster/examples/kubernetes/ceph
$ kubectl create -f crds.yaml -f common.yaml -f operator.yaml

# 安装集群
$ kubectl create -f cluster.yaml

# 安装toolbox工具
$ kubectl create -f toolbox.yaml

# 安装dashboard
$ kubectl create -f dashboard-external-https.yaml

# 创建cephfs类型的storageclass
$ k apply -f ./csi/cephfs/storageclass.yaml
  1. 验证安装
# 等一会儿,会发现所有的pod都启动了
$ watch kubectl get pods -n rook-ceph
NAME                                                 READY   STATUS      RESTARTS   AGE
csi-cephfsplugin-9phgd                               3/3     Running     0          19m
csi-cephfsplugin-b5zm7                               3/3     Running     0          19m
csi-cephfsplugin-mvx7b                               3/3     Running     0          19m
csi-cephfsplugin-ngkpt                               3/3     Running     0          19m
csi-cephfsplugin-provisioner-db45f85f5-77658         6/6     Running     0          19m
csi-cephfsplugin-provisioner-db45f85f5-88xm9         6/6     Running     0          19m
csi-cephfsplugin-xvrfz                               3/3     Running     0          19m
csi-rbdplugin-59f94                                  3/3     Running     0          19m
csi-rbdplugin-76g7n                                  3/3     Running     0          19m
csi-rbdplugin-p4twb                                  3/3     Running     0          19m
csi-rbdplugin-pjsw9                                  3/3     Running     0          19m
csi-rbdplugin-provisioner-d85cbdb48-cm8zb            6/6     Running     0          19m
csi-rbdplugin-provisioner-d85cbdb48-xg7ph            6/6     Running     0          19m
csi-rbdplugin-tj2vr                                  3/3     Running     0          19m
rook-ceph-crashcollector-gpu-1-5b68d5cd59-mtn7s      1/1     Running     0          19m
rook-ceph-crashcollector-gpu-2-868f498db-p4cg5       1/1     Running     0          16m
rook-ceph-crashcollector-gpu-3-6959b695d5-ns6hf      1/1     Running     0          16m
rook-ceph-crashcollector-worker01-6446f7c66d-4dgvr   1/1     Running     0          18m
rook-ceph-crashcollector-worker02-7c5dbc645-q4xsh    1/1     Running     0          19m
rook-ceph-mgr-a-56dc6bd5dd-ss6d5                     1/1     Running     0          18m
rook-ceph-mon-a-7cf96d4f9f-9l5kq                     1/1     Running     0          19m
rook-ceph-mon-b-7c4d4c48c6-g9z2w                     1/1     Running     0          19m
rook-ceph-mon-c-7dc65846d-hmn6b                      1/1     Running     0          19m
rook-ceph-operator-54cf7487d4-9zhn6                  1/1     Running     0          20m
rook-ceph-osd-0-765fbb9f79-66v97                     1/1     Running     0          18m
rook-ceph-osd-1-d566bfc77-w5xpd                      1/1     Running     1          16m
rook-ceph-osd-2-76754d4875-lzqrz                     1/1     Running     0          16m
rook-ceph-osd-prepare-gpu-1-c6zgm                    0/1     Completed   0          15m
rook-ceph-osd-prepare-gpu-2-r8m9r                    0/1     Completed   0          15m
rook-ceph-osd-prepare-gpu-3-l8skf                    0/1     Completed   0          15m
rook-ceph-osd-prepare-worker01-nq4g2                 0/1     Completed   0          15m
rook-ceph-osd-prepare-worker02-d4qrw                 0/1     Completed   0          15m

# 查看 ceph 状态
$ kubectl -n rook-ceph exec -it rook-ceph-tools-76c7d559b6-8w7bk -- sh -c 'ceph status'
  cluster:
    id:     5db57586-6d6f-4529-a956-b41242046ff2
    health: HEALTH_WARN
            clock skew detected on mon.b, mon.c
            mon c is low on available space
 
  services:
    mon: 3 daemons, quorum a,b,c (age 30m)
    mgr: a(active, since 26m)
    osd: 3 osds: 3 up (since 27m), 3 in (since 27m)
 
  data:
    pools:   1 pools, 1 pgs
    objects: 0 objects, 0 B
    usage:   3.0 GiB used, 87 GiB / 90 GiB avail
    pgs:     1 active+clean

# 查看dashboard
$ kubectl -n rook-ceph get service
NAME                                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
csi-cephfsplugin-metrics                 ClusterIP   10.106.161.37    <none>        8080/TCP,8081/TCP   3h13m
csi-rbdplugin-metrics                    ClusterIP   10.106.22.108    <none>        8080/TCP,8081/TCP   3h13m
rook-ceph-mgr                            ClusterIP   10.99.57.141     <none>        9283/TCP            3h12m
rook-ceph-mgr-dashboard                  ClusterIP   10.109.130.98    <none>        8443/TCP            3h12m
rook-ceph-mgr-dashboard-external-http    NodePort    10.98.243.88     <none>        7000:30574/TCP      9m49s
rook-ceph-mgr-dashboard-external-https   NodePort    10.96.251.99     <none>        8443:32066/TCP      5s
rook-ceph-mon-a                          ClusterIP   10.100.24.39     <none>        6789/TCP,3300/TCP   3h13m
rook-ceph-mon-b                          ClusterIP   10.107.108.211   <none>        6789/TCP,3300/TCP   3h13m
rook-ceph-mon-c                          ClusterIP   10.96.149.72     <none>        6789/TCP,3300/TCP   3h12m
NAME                                     TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
csi-cephfsplugin-metrics                 ClusterIP   10.106.161.37    <none>        8080/TCP,8081/TCP   3h13m
csi-rbdplugin-metrics                    ClusterIP   10.106.22.108    <none>        8080/TCP,8081/TCP   3h13m
rook-ceph-mgr                            ClusterIP   10.99.57.141     <none>        9283/TCP            3h12m
rook-ceph-mgr-dashboard                  ClusterIP   10.109.130.98    <none>        8443/TCP            3h12m
rook-ceph-mgr-dashboard-external-http    NodePort    10.98.243.88     <none>        7000:30574/TCP      9m49s
rook-ceph-mgr-dashboard-external-https   NodePort    10.96.251.99     <none>        8443:32066/TCP      5s
rook-ceph-mon-a                          ClusterIP   10.100.24.39     <none>        6789/TCP,3300/TCP   3h13m
rook-ceph-mon-b                          ClusterIP   10.107.108.211   <none>        6789/TCP,3300/TCP   3h13m
rook-ceph-mon-c                          ClusterIP   10.96.149.72     <none>        6789/TCP,3300/TCP   3h12m

# 获取dashboard的密码,用户名是admin
$ kubectl -n rook-ceph get secret rook-ceph-dashboard-password -o jsonpath="{['data']['password']}" | base64 --decode && echo


FAQ
  • crashcollector pod启动报错:Unable to attach or mount volumes: unmounted volumes=[rook-ceph-crash-collector-keyring], unattached volumes=[default-token-vttr8 rook-ceph-crash-collector-keyring rook-config-override rook-ceph-log rook-ceph-crash]: timed out waiting for the condition
    A:按照如下步骤操作:
$ kubectl delete -f cluster.yaml
$ kubectl delete -f operator.yaml -f common.yaml -f crds.yaml 

# 集群中的每台机器上面执行命令
$ rm -rf /var/lib/rook /var/lib/kubelet/plugins_registry/* /var/lib/kubelet/plugins/

# 重新部署cluster
$ kubectl apply -f cluster.yaml

如果还是不行,就走开篇的重装ceph的步骤吧

# 首先确定mon.b, mon.c服务所在的节点
$ k get pod rook-ceph-mon-b-7c4d4c48c6-g9z2w -n rook-ceph -o wide
NAME                               READY   STATUS    RESTARTS   AGE   IP               NODE       NOMINATED NODE   READINESS GATES
rook-ceph-mon-b-7c4d4c48c6-g9z2w   1/1     Running   0          98m   100.103.88.144   worker02   <none>           <none>

# 查看worker02的节点时间和master机器的时间差
$ apt install iputils-clockdiff
$ clockdiff 10.20.17.193
.
host=10.20.17.193 rtt=750(187)ms/0ms delta=0ms/0ms Thu Apr 22 20:44:20 2021

# 去每台机器上面检查ntp服务是否启动
$ systemctl status ntp                                                           
● ntp.service - Network Time Service
   Loaded: loaded (/lib/systemd/system/ntp.service; enabled; vendor preset: enabled)
   Active: active (running) since Thu 2021-04-22 20:26:46 CST; 19min ago
     Docs: man:ntpd(8)
 Main PID: 32140 (ntpd)
    Tasks: 2 (limit: 4915)
   CGroup: /system.slice/ntp.service
           └─32140 /usr/sbin/ntpd -p /var/run/ntpd.pid -g -u 111:115

Apr 22 20:37:03 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:37:05 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:37:07 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:37:07 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:39:22 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:39:43 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:39:54 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:40:14 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:40:24 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>
Apr 22 20:40:25 master ntpd[32140]: 192.168.222.1 local addr 10.20.17.193 -> <null>

# 启动ntp服务
$ systemctl status ntp

# 开机自启动ntp服务 
$ systemctl enable ntp 
  • 执行ceph status命令的时候报warn:mon c is low on available space
    A: 这个问题是因为mon.c服务所在宿主机节点的磁盘空间不足,注意不是挂载的ceph磁盘不足。
Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐