ceph-csi: rbd volume failed to mount
Describe the bug
rbd volume failed to mount.
Environment details
- Image/version of Ceph CSI driver : v3.3.1
- Helm chart version :
- Kernel version : 4.15.0-161-generic
- Mounter used for mounting PVC (for cephFS its
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : krbd - Kubernetes cluster version : v1.20.11
- Ceph cluster version : Octopus
Steps to reproduce
We met several times the rbd volume failed to attach to the node. It repeatedly report the error “an operation with the given Volume ID already exists”. But the first time when rbdplugin does NodeStageVolume, there is no error reported. But the attachRBDImage function seems not be called in https://github.com/ceph/ceph-csi/blob/83df1eae53a0e0e2b3b8ff0972f32ca110baf862/internal/rbd/nodeserver.go#L408
Below is the log from the node where the kernel is 4.15.0-161-generic. The same issue also obeserved some times on the cluster where the kernel is 5.4.0-80-generic. The difference is no this log “kernel 4.15.0-161-generic does not support required features” for the higher kernel.
I1010 08:14:41.840459 1658 utils.go:162] ID: 556208 GRPC call: /csi.v1.Identity/Probe
I1010 08:14:41.840597 1658 utils.go:166] ID: 556208 GRPC request: {}
I1010 08:14:41.840631 1658 utils.go:173] ID: 556208 GRPC response: {}
I1010 08:14:48.574124 1658 utils.go:162] ID: 556209 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1010 08:14:48.574183 1658 utils.go:166] ID: 556209 GRPC request: {}
I1010 08:14:48.574283 1658 utils.go:173] ID: 556209 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}}]}
I1010 08:14:48.585958 1658 utils.go:162] ID: 556210 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC call: /csi.v1.Node/NodeStageVolume
I1010 08:14:48.586289 1658 utils.go:166] ID: 556210 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4387e5d6-808a-4ab4-8f36-4862963406c8/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["_netdev"]}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"228f86e3-0da9-4e52-986c-45f2fd7834a7","csi.storage.k8s.io/pv/name":"pvc-4387e5d6-808a-4ab4-8f36-4862963406c8","csi.storage.k8s.io/pvc/name":"runofpipeline-teste1929lm9qc-1-3116537416-pipeline-pvc","csi.storage.k8s.io/pvc/namespace":"aiflash","imageFeatures":"layering","imageName":"csi-vol-f18a2ed0-486f-11ed-a3cb-dee5209d4233","journalPool":"turing005","pool":"turing005","storage.kubernetes.io/csiProvisionerIdentity":"1664378323785-8081-rbd.csi.ceph.com"},"volume_id":"0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233"}
I1010 08:14:48.587225 1658 rbd_util.go:977] ID: 556210 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 setting disableInUseChecks: false image features: [layering] mounter: rbd
I1010 08:14:48.593626 1658 omap.go:84] ID: 556210 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 got omap values: (pool="turing005", namespace="", name="csi.volume.f18a2ed0-486f-11ed-a3cb-dee5209d4233"): map[csi.imageid:c11489c3f0b63d csi.imagename:csi-vol-f18a2ed0-486f-11ed-a3cb-dee5209d4233 csi.volname:pvc-4387e5d6-808a-4ab4-8f36-4862963406c8 csi.volume.owner:aiflash]
E1010 08:14:48.593929 1658 util.go:233] kernel 4.15.0-161-generic does not support required features
I1010 08:15:15.528742 1658 utils.go:162] ID: 556213 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1010 08:15:15.528849 1658 utils.go:166] ID: 556213 GRPC request: {}
I1010 08:15:15.528939 1658 utils.go:173] ID: 556213 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}}]}
I1010 08:15:15.529838 1658 utils.go:162] ID: 556214 GRPC call: /csi.v1.Node/NodeGetVolumeStats
I1010 08:15:15.529896 1658 utils.go:166] ID: 556214 GRPC request: {"volume_id":"0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-a43376cc-d68a-11ec-a3cb-dee5209d4233","volume_path":"/var/lib/kubelet/pods/01d3bdf8-b504-4d84-95d6-82e8dfc74771/volumes/kubernetes.io~csi/pvc-ff77e6d2-5468-48a3-adc4-622b20af6c8b/mount"}
After the first time, the next NodeStageVolume reported with error “an operation with the given Volume ID already exists” repeatedly. We have to restart the csi-rbdplugin pod to make the volume work.
"type":2}}},{"Type":{"Rpc":{"type":3}}}]}
I1010 08:16:49.153389 1658 utils.go:162] ID: 556222 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC call: /csi.v1.Node/NodeStageVolume
I1010 08:16:49.153524 1658 utils.go:166] ID: 556222 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-4387e5d6-808a-4ab4-8f36-4862963406c8/globalmount","volume_capability":{"AccessType":{"Mount":{"fs_type":"ext4","mount_flags":["_netdev"]}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"228f86e3-0da9-4e52-986c-45f2fd7834a7","csi.storage.k8s.io/pv/name":"pvc-4387e5d6-808a-4ab4-8f36-4862963406c8","csi.storage.k8s.io/pvc/name":"runofpipeline-teste1929lm9qc-1-3116537416-pipeline-pvc","csi.storage.k8s.io/pvc/namespace":"aiflash","imageFeatures":"layering","imageName":"csi-vol-f18a2ed0-486f-11ed-a3cb-dee5209d4233","journalPool":"turing005","pool":"turing005","storage.kubernetes.io/csiProvisionerIdentity":"1664378323785-8081-rbd.csi.ceph.com"},"volume_id":"0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233"}
E1010 08:16:49.153644 1658 nodeserver.go:141] ID: 556222 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 an operation with the given Volume ID 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 already exists
E1010 08:16:49.153679 1658 utils.go:171] ID: 556222 Req-ID: 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 GRPC error: rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0024-228f86e3-0da9-4e52-986c-45f2fd7834a7-000000000000025a-f18a2ed0-486f-11ed-a3cb-dee5209d4233 already exists
I1010 08:16:50.249749 1658 utils.go:162] ID: 556223 GRPC call: /csi.v1.Node/NodeGetCapabilities
I1010 08:16:50.249815 1658 utils.go:166] ID: 556223 GRPC request: {}
Expected behavior
The volume should be attached and mounted. Or with error for what failed.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17
@Madhu-1 , could you please reopen this issue, my colleague had some new findings and need your help. Many thanks!