在ceph rbd存储中动态分配pv,我遇到两次删除不掉pod的问题(具体原因还暂时没搞明白),所有pod都显示Terminating状态。使用如下命令可以强制删除:

kubectl delete pods <pod> --grace-period=0 --force

但是pod删除后,pod动态挂载的pvc也会删除异常,我这里也使用了强制手动删除的方式,删除pvc和pv。

经过这么一翻操作后发现ceph的rbd不会释放,使用ceph df查看已占用空间不变。

使用rbd ls <poolname>可以查看到池子里动态生成的image还在,接下来再进行一翻强制删除,反正已经蛮干了,也不差这一步。ceph rm <poolname>/<imagesname>

这里直接手动删发现删除不了,会报check_image_watchers: image has watchers - not removing错误。我查了一下相关文档,这是存储卷还没有释放导致,解决办法依旧选择蛮干。

1、检查watcher image的client
rbd status ceph-block/csi-vol-0068f225-14f7-11eb-ac08-2a0aff2a8247
Watchers:
        watcher=10.244.2.0:0/1036319188 client.974190 cookie=18446462598732840961


2、把watcher ip加入到黑名单
ceph osd blacklist add 10.244.2.0:0/1036319188
blacklisting 10.244.2.0:0/1036319188 until 2020-10-31T08:47:43.513987+0000 (3600 sec)

3、再删除image
rbd rm ceph-block/csi-vol-0068f225-14f7-11eb-ac08-2a0aff2a8247
Removing image: 100% complete...done.

4、把刚才加入的ip退出黑名单
ceph osd blacklist rm 10.244.2.0:0/1036319188
un-blacklisting 10.244.2.0:0/1036319188
#查看黑名单列表
> ceph osd blacklist ls

这时候,再使用ceph df查看以占用的空间已经恢复正常了。如何现在使用pvc动态部署pod会发现存储卷不能使用,部署的pod也无法正常运行。通过describe查看,会发现显示有存储卷为只读的systemfiles字样,无法使用。通过添加-o wide参数查看pod部署的node节点,登到node节点上进行查看,几乎执行所有和磁盘有关的命令都会报 Input/output error错误。

使用lsblk查看
NAME   MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0    7:0    0 55.5M  1 loop /snap/core18/2074
loop1    7:1    0 55.4M  1 loop /snap/core18/2128
loop2    7:2    0 67.6M  1 loop /snap/lxd/20326
loop3    7:3    0 70.3M  1 loop /snap/lxd/21029
loop5    7:5    0 32.3M  1 loop /snap/snapd/12704
loop6    7:6    0 32.3M  1 loop /snap/snapd/12883
sda      8:0    0  1.1T  0 disk
├─sda1   8:1    0    1M  0 part
├─sda2   8:2    0    1G  0 part /boot
└─sda3   8:3    0  1.1T  0 part
  └─ubuntu--vg-ubuntu--lv
       253:0    0  200G  0 lvm  /
sr0     11:0    1 1024M  0 rom
rbd0   252:0    0   10G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd1   252:16   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd2   252:32   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd3   252:48   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd4   252:64   0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd5   252:80   0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd6   252:96   0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd7   252:112  0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd8   252:128  0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd9   252:144  0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount
rbd10  252:160  0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mount

会发现原来的rbd 块设备挂载的目录还存在,并没有卸载掉。刚才报错应该就是这个原因。

接下来卸载挂载目录就应该ok了。但是要注意一点,这里挂载的目录可能有些是有效并在使用中的,千万别把这些也卸载了。可以通过一下方式辨别。

//kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                     STORAGECLASS      REASON   AGE
pvc-6fc729c1-6f50-4afd-ab8e-b5acddbf64fc   8Gi        RWO            Delete           Bound    gitlab/gitlab-prometheus-server           ceph-gitlab-rbd            10d
pvc-74015690-054d-48be-a8c0-af8a895750e7   10Gi       RWO            Delete           Bound    gitlab/gitlab-minio                       ceph-gitlab-rbd            10d
pvc-7545aadc-829d-4a5f-ab81-736c6fc9ac7b   8Gi        RWO            Delete           Bound    gitlab/data-gitlab-postgresql-0           ceph-gitlab-rbd            10d
pvc-c9fd941e-d99d-437b-963e-6e7a1cb20050   8Gi        RWO            Delete           Bound    gitlab/redis-data-gitlab-redis-master-0   ceph-gitlab-rbd            10d
pvc-f288093c-d60a-4f49-8d10-c112b927dcf4   8Gi        RWO            Delete           Bound    jenkins/jenkins                           ceph-rbd                   10d

不在这个列表的应该就是可以卸载的。

1、通过一下方式查看具体挂载目录
mount |grep rbd9  //这里的数字就是前面查看的块设备对应的号
/dev/rbd9 on /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-7abd663f-06da-11ec-bfb1-da58ba56442c type ext4 (rw,relatime,stripe=16)

2、卸载

sudo umount /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-7abd663f-06da-11ec-bfb1-da58ba56442c


3、使用lsblk查看
NAME                      MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
loop0                       7:0    0 55.5M  1 loop /snap/core18/2074
loop1                       7:1    0 55.4M  1 loop /snap/core18/2128
loop2                       7:2    0 67.6M  1 loop /snap/lxd/20326
loop3                       7:3    0 70.3M  1 loop /snap/lxd/21029
loop5                       7:5    0 32.3M  1 loop /snap/snapd/12704
loop6                       7:6    0 32.3M  1 loop /snap/snapd/12883
sda                         8:0    0  1.1T  0 disk
├─sda1                      8:1    0    1M  0 part
├─sda2                      8:2    0    1G  0 part /boot
└─sda3                      8:3    0  1.1T  0 part
  └─ubuntu--vg-ubuntu--lv 253:0    0  200G  0 lvm  /
sr0                        11:0    1 1024M  0 rom
rbd0                      252:0    0   10G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-294bd568-00d2-11ec-8d41-0e03797b96fa
rbd1                      252:16   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-2a5ba93a-00d2-11ec-8d41-0e03797b96fa
rbd2                      252:32   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-2942b322-00d2-11ec-8d41-0e03797b96fa
rbd3                      252:48   0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/gitlab.rbd-image-kubernetes-dynamic-pvc-2a6b2a58-00d2-11ec-8d41-0e03797b96fa
rbd4                      252:64   0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79993372-06da-11ec-bfb1-da58ba56442c
rbd5                      252:80   0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79a09b13-06da-11ec-bfb1-da58ba56442c
rbd6                      252:96   0    1G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79dbb137-06da-11ec-bfb1-da58ba56442c
rbd7                      252:112  0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-7afaaf63-06da-11ec-bfb1-da58ba56442c
rbd8                      252:128  0    5G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/harbor.rbd-image-kubernetes-dynamic-pvc-79a017aa-06da-11ec-bfb1-da58ba56442c
rbd9                      252:144  0    1G  0 disk
rbd10                     252:160  0    8G  0 disk /var/lib/kubelet/plugins/kubernetes.io/rbd/mounts/kube-image-kubernetes-dynamic-pvc-63c3c453-00fc-11ec-8d41-0e03797b96fa

4、取消映射关系
sudo rbd unmap /dev/rbd9

然后再使用lsblk查看,rbd对应的块设备就正常释放了。

这里有可能还需要重启服务器,看来蛮干的代价还是惨重的。

现在再去部署pod就正常了。

回过头来仔细想想,这个异常应该是操作步骤不当造成的,如果反过来顺序,在pod强制删除后不要急着删除pvc和pv,先到节点上把相应块设备卸载掉,在去删除ceph中的image应该就不会出现异常了。

Logo

K8S/Kubernetes社区为您提供最前沿的新闻资讯和知识内容

更多推荐