ceph-csi: rbd volume failed to mount

Describe the bug

rbd volume failed to mount.

Environment details

  • Image/version of Ceph CSI driver : v3.3.1
  • Helm chart version :
  • Kernel version : 4.15.0-161-generic
  • Mounter used for mounting PVC (for cephFS its fuse or kernel. for rbd its krbd or rbd-nbd) : krbd
  • Kubernetes cluster version : v1.20.11
  • Ceph cluster version : Octopus

Steps to reproduce

We met several times the rbd volume failed to attach to the node. It repeatedly report the error “an operation with the given Volume ID already exists”. But the first time when rbdplugin does NodeStageVolume, there is no error reported. But the attachRBDImage function seems not be called in https://github.com/ceph/ceph-csi/blob/83df1eae53a0e0e2b3b8ff0972f32ca110baf862/internal/rbd/nodeserver.go#L408

Below is the log from the node where the kernel is 4.15.0-161-generic. The same issue also obeserved some times on the cluster where the kernel is 5.4.0-80-generic. The difference is no this log “kernel 4.15.0-161-generic does not support required features” for the higher kernel.

I1010 08:14:41.840459    1658 utils.go:162] ID: 556208 GRPC call: /csi.v1.Identity/Probe
I1010 08:14:41.840597    1658 utils.go:166] ID: 556208 GRPC request: {}
I1010 08:14:41.840631    1658 utils.go:173] ID: 556208 GRPC response: {}
I1010 08:14:48.574124    1658 utils.go:162] ID: 556209 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1010 08:14:48.574183    1658 utils.go:166] ID: 556209 GRPC request: {}
I1010 08:14:48.574283    1658 utils.go:173] ID: 556209 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}}]}
I1010 08:14:48.585958    1658 utils.go:162] ID: 556210 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC call: /csi.v1.Node/NodeStageVolume
I1010 08:14:48.586289    1658 utils.go:166] ID: 556210 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4387e5d6-808a-4ab4-8f36-4862963406c8/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["_netdev"]}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"228f86e3-0da9-4e52-986c-45f2fd7834a7","csi.storage.k8s.io/pv/name":"pvc-4387e5d6-808a-4ab4-8f36-4862963406c8","csi.storage.k8s.io/pvc/name":"runofpipeline-teste1929lm9qc-1-3116537416-pipeline-pvc","csi.storage.k8s.io/pvc/namespace":"aiflash","imageFeatures":"layering","imageName":"csi-vol-f18a2ed0-486f-11ed-a3cb-dee5209d4233","journalPool":"turing005","pool":"turing005","storage.kubernetes.io/csiProvisionerIdentity":"1664378323785-8081-rbd.csi.ceph.com"},"volume_id":"0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233"}
I1010 08:14:48.587225    1658 rbd_util.go:977] ID: 556210 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 setting disableInUseChecks: false image features: [layering] mounter: rbd
I1010 08:14:48.593626    1658 omap.go:84] ID: 556210 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 got omap values: (pool="turing005", namespace="", name="csi.volume.f18a2ed0-486f-11ed-a3cb-dee5209d4233"): map[csi.imageid:c11489c3f0b63d csi.imagename:csi-vol-f18a2ed0-486f-11ed-a3cb-dee5209d4233 csi.volname:pvc-4387e5d6-808a-4ab4-8f36-4862963406c8 csi.volume.owner:aiflash]
E1010 08:14:48.593929    1658 util.go:233] kernel 4.15.0-161-generic does not support required features
I1010 08:15:15.528742    1658 utils.go:162] ID: 556213 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1010 08:15:15.528849    1658 utils.go:166] ID: 556213 GRPC request: {}
I1010 08:15:15.528939    1658 utils.go:173] ID: 556213 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}}]}
I1010 08:15:15.529838    1658 utils.go:162] ID: 556214 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I1010 08:15:15.529896    1658 utils.go:166] ID: 556214 GRPC request: {"volume_id":"0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-a43376cc-d68a-11ec-a3cb-dee5209d4233","volume_path":"/var/lib/kubelet/pods/01d3bdf8-b504-4d84-95d6-82e8dfc74771/volumes/kubernetes.io~csi/pvc-ff77e6d2-5468-48a3-adc4-622b20af6c8b/mount"}

After the first time, the next NodeStageVolume reported with error “an operation with the given Volume ID already exists” repeatedly. We have to restart the csi-rbdplugin pod to make the volume work.

"type":2}}},{"Type":{"Rpc":{"type":3}}}]}
I1010 08:16:49.153389    1658 utils.go:162] ID: 556222 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC call: /csi.v1.Node/NodeStageVolume
I1010 08:16:49.153524    1658 utils.go:166] ID: 556222 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4387e5d6-808a-4ab4-8f36-4862963406c8/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["_netdev"]}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"228f86e3-0da9-4e52-986c-45f2fd7834a7","csi.storage.k8s.io/pv/name":"pvc-4387e5d6-808a-4ab4-8f36-4862963406c8","csi.storage.k8s.io/pvc/name":"runofpipeline-teste1929lm9qc-1-3116537416-pipeline-pvc","csi.storage.k8s.io/pvc/namespace":"aiflash","imageFeatures":"layering","imageName":"csi-vol-f18a2ed0-486f-11ed-a3cb-dee5209d4233","journalPool":"turing005","pool":"turing005","storage.kubernetes.io/csiProvisionerIdentity":"1664378323785-8081-rbd.csi.ceph.com"},"volume_id":"0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233"}
E1010 08:16:49.153644    1658 nodeserver.go:141] ID: 556222 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 an operation with the given Volume ID 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 already exists
E1010 08:16:49.153679    1658 utils.go:171] ID: 556222 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 already exists
I1010 08:16:50.249749    1658 utils.go:162] ID: 556223 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1010 08:16:50.249815    1658 utils.go:166] ID: 556223 GRPC request: {}

Expected behavior

The volume should be attached and mounted. Or with error for what failed.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17

Most upvoted comments

@Madhu-1 , could you please reopen this issue, my colleague had some new findings and need your help. Many thanks!