ceph-csi: [RBD] parallel PVC creation on a newly created block pool will hang
Describe the bug
The following bug in librbd causes parallel pvc creation request on a newly created block pool to hang.
Concurrent rbd_pool_init() or rbd_create() operations on an unvalidated
(uninitialized) pool trigger a lockup in ValidatePoolRequest state
machine caused by blocking selfmanaged_snap_{create,remove}() calls.
Ceph issue tracker : https://tracker.ceph.com/issues/52537
Ceph pacific backport pr with fix : https://github.com/ceph/ceph/pull/43113
Environment details
- Image/version of Ceph CSI driver : v3.4.0
Steps to reproduce
- Create new rbd block pool(with no images) + StorageClass against the CSI provisioner.
- Create Multiple PVCs in parallel
- Creation request will stay in pending state indefinitely
Actual results
- Creation request will stay in pending state indefinitely
Expected behavior
- Creation request should succeed
Updated Work Around (does not leave stale imap entries, thanks @Madhu-1 )
- execute
rbd pool init <pool_name>
directly on cluster or from csi pods. - Restart csi rbdplugin provisioner pod
- and PVCs will go to bound state without leaving any stale resources.
After the above steps, parallel PVC creation requests should work fine.
Work Around (Not recommended, will leave stale omap entries)
-
Delete ongoing PVC creation requests.
-
Restart csi rbdplugin provisioner pod Either
- Issue a single PVC create request which will succeed.
- or call
rbd pool init <pool_name>
on the ceph cluster.
After the above steps, parallel PVC creation requests should work fine.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 22 (12 by maintainers)
Commits related to this issue
- Bump rook-ceph 1.7.1 -> 1.8.9 Seems like we're hitting parallel PVC creation bug in kubevirt/kubevirt: https://github.com/kubevirt/kubevirt/pull/7783 https://github.com/ceph/ceph-csi/issues/2521 Rep... — committed to akalenyu/kubevirtci by akalenyu 2 years ago
- Bump rook-ceph 1.7.1 -> 1.8.9 Seems like we're hitting parallel PVC creation bug in kubevirt/kubevirt: https://github.com/kubevirt/kubevirt/pull/7783 https://github.com/ceph/ceph-csi/issues/2521 Rep... — committed to akalenyu/kubevirtci by akalenyu 2 years ago
- Bump rook-ceph 1.7.1 -> 1.8.9 (#803) Seems like we're hitting parallel PVC creation bug in kubevirt/kubevirt: https://github.com/kubevirt/kubevirt/pull/7783 https://github.com/ceph/ceph-csi/issues/25... — committed to kubevirt/kubevirtci by akalenyu 2 years ago
Thanks for notifying on this one