ceph-csi: Fail to mount RBD PVC with error Internal desc = exit status 2
Describe the bug
I deployed ceph cluster using cephadm successfully. I can create StorageClass and pvc with CephFS , instead, RBD failed.
Environment details
- Image/version of Ceph CSI driver : v1.3.0-0
- Helm chart version : NO
- Kernel version : 4.18.0-147.8.1.el8_1.x86_64
- Mounter used for mounting PVC (for cephfs its
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : krbd(? not sure) - Kubernetes cluster version : 1.18.4
- Ceph cluster version : 15.2.4 octopus (stable)
Steps to reproduce
Steps to reproduce the behavior:
- Setup details: ‘deploy kubernetes cluster, install cephadm and deploy the cluster (3 nodes), try to use rbd following the Guide’
- Deployment to trigger the issue
Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[default-token-jn7tw data]: timed out waiting for the condition
- See error
Actual results
Pod status is still ContainerCreating,
Expected behavior
Pod can mount the PVC.
Logs
Pod event:
Normal Scheduled 18m default-scheduler Successfully assigned mars/pod-with-raw-block-volume to 10.0.0.202
Normal SuccessfulAttachVolume 18m attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4"
Warning FailedMount 9m30s (x2 over 11m) kubelet, 10.0.0.202 Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data default-token-jn7tw]: timed out waiting for the condition
Warning FailedMapVolume 100s (x16 over 18m) kubelet, 10.0.0.202 MapVolume.SetUpDevice failed for volume "pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4" : rpc error: code = Internal desc = exit status 2
Warning FailedMount 26s (x6 over 16m) kubelet, 10.0.0.202 Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[default-token-jn7tw data]: timed out waiting for the condition
Kubelet log:
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733346 3119 clientconn.go:106] parsed scheme: ""
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733358 3119 clientconn.go:106] scheme "" not registered, fallback to default scheme
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733374 3119 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock <nil> 0 <nil>}] <nil> <nil>}
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733382 3119 clientconn.go:933] ClientConn switching balancer to "pick_first"
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733423 3119 clientconn.go:882] blockingPicker: the picked transport is not ready, loop back to repick
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.736950 3119 clientconn.go:106] parsed scheme: ""
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.736960 3119 clientconn.go:106] scheme "" not registered, fallback to default scheme
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.736973 3119 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock <nil> 0 <nil>}] <nil> <nil>}
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.736980 3119 clientconn.go:933] ClientConn switching balancer to "pick_first"
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.737004 3119 clientconn.go:882] blockingPicker: the picked transport is not ready, loop back to repick
Jul 09 10:33:10 imagego-test-0002 kubelet[3119]: E0709 10:33:10.086394 3119 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d podName: nodeName:}" failed. No retries permitted until 2020-07-09 10:35:12.086366695 +0800 CST m=+167228.245629512 (durationBeforeRetry 2m2s). Error: "MapVolume.SetUpDevice failed for volume \"pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d\") pod \"pod-with-raw-block-volume\" (UID: \"2b7a5741-2fee-4bc9-a440-c218112b686a\") : rpc error: code = Internal desc = exit status 2"
Jul 09 10:33:49 imagego-test-0002 kubelet[3119]: E0709 10:33:49.563137 3119 kubelet.go:1681] Unable to attach or mount volumes for pod "pod-with-raw-block-volume_mars(2b7a5741-2fee-4bc9-a440-c218112b686a)": unmounted volumes=[data], unattached volumes=[default-token-jn7tw data]: timed out waiting for the condition; skipping pod
Jul 09 10:33:49 imagego-test-0002 kubelet[3119]: E0709 10:33:49.563166 3119 pod_workers.go:191] Error syncing pod 2b7a5741-2fee-4bc9-a440-c218112b686a ("pod-with-raw-block-volume_mars(2b7a5741-2fee-4bc9-a440-c218112b686a)"), skipping: unmounted volumes=[data], unattached volumes=[default-token-jn7tw data]: timed out waiting for the condition
Jul 09 10:35:12 imagego-test-0002 kubelet[3119]: I0709 10:35:12.108777 3119 operation_generator.go:950] MapVolume.WaitForAttach entering for volume "pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d") pod "pod-with-raw-block-volume" (UID: "2b7a5741-2fee-4bc9-a440-c218112b686a") DevicePath ""
Jul 09 10:35:12 imagego-test-0002 kubelet[3119]: I0709 10:35:12.110941 3119 operation_generator.go:959] MapVolume.WaitForAttach succeeded for volume "pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d") pod "pod-with-raw-block-volume" (UID: "2b7a5741-2fee-4bc9-a440-c218112b686a") DevicePath "csi-fa9d3f5524a7cf5f62e8f3df5a8710426944870b866c57fd437624f1f2480c89"
csi-rbdplugin log:
I0709 02:06:23.305910 21535 utils.go:159] ID: 25 GRPC call: /csi.v1.Identity/Probe
I0709 02:06:23.305922 21535 utils.go:160] ID: 25 GRPC request: {}
I0709 02:06:23.306322 21535 utils.go:165] ID: 25 GRPC response: {}
I0709 02:06:38.310790 21535 utils.go:159] ID: 26 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0709 02:06:38.310803 21535 utils.go:160] ID: 26 GRPC request: {}
I0709 02:06:38.311247 21535 utils.go:165] ID: 26 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}}]}
I0709 02:06:38.314646 21535 utils.go:159] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d GRPC call: /csi.v1.Node/NodeStageVolume
I0709 02:06:38.314655 21535 utils.go:160] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4","volume_capability":{"AccessType":{"Block":{}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"1e883026-c03d-11ea-9dd5-00163e101e9f","imageName":"csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d","journalPool":"rbd","pool":"rbd","storage.kubernetes.io/csiProvisionerIdentity":"1594195443622-8081-rbd.csi.ceph.com"},"volume_id":"0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d"}
I0709 02:06:38.315790 21535 rbd_util.go:806] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d setting disableInUseChecks on rbd volume to: false
I0709 02:06:38.316244 21535 omap.go:74] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d got omap values: (pool="rbd", namespace="", name="csi.volume.360233ac-c184-11ea-aa46-d6eec0877e2d"): map[csi.imageid:5ee19ac08555 csi.imagename:csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d csi.volname:pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4]
E0709 02:06:38.316348 21535 util.go:250] kernel 4.18.0-147.8.1.el8_1.x86_64 does not support required features
I0709 02:06:38.358623 21535 rbd_util.go:322] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d executing [rbd task add flatten rbd/csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d --id kubernetes --keyfile=/tmp/csi/keys/keyfile-260375200 -m 10.0.0.201:6789,10.0.0.202:6789,10.0.0.203:6789] for image (csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d) using mon 10.0.0.201:6789,10.0.0.202:6789,10.0.0.203:6789, pool rbd
W0709 02:06:38.668018 21535 rbd_util.go:336] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d uncaught error while scheduling a task: exit status 2
E0709 02:06:38.668037 21535 rbd_util.go:464] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d failed to add task flatten for rbd/csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d : exit status 2
E0709 02:06:38.668142 21535 utils.go:163] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d GRPC error: rpc error: code = Internal desc = exit status 2
- ** dmesg logs.**
[26687.317120] rbd: loaded (major 251)
[76054.217590] IPv6: ADDRCONF(NETDEV_UP): cali13b71e8c17d: link is not ready
[76054.218040] IPv6: ADDRCONF(NETDEV_CHANGE): cali13b71e8c17d: link becomes ready
[85851.683017] libceph: mon2 10.0.0.203:6789 session established
[85851.683289] libceph: client34217 fsid 1e883026-c03d-11ea-9dd5-00163e101e9f
[85851.688494] rbd: rbd0: capacity 2147483648 features 0x1
[85852.512339] EXT4-fs (rbd0): mounted filesystem with ordered data mode. Opts: (null)
[85852.797739] IPv6: ADDRCONF(NETDEV_UP): cali4f1f3127bcd: link is not ready
[85852.798450] IPv6: ADDRCONF(NETDEV_CHANGE): cali4f1f3127bcd: link becomes ready
[91817.223254] libceph: mon1 10.0.0.202:6789 session established
[91817.223540] libceph: client24244 fsid 1e883026-c03d-11ea-9dd5-00163e101e9f
Additional context
There was another issue driver name rbd.csi.ceph.com not found in the list of registered CSI drivers
yesterday. I read the issue #1105 then recreated the rbd-plugin pod, it works fine.
For example: Nothing.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 42
This issue can resolved by adding
imageFeatures: layering
. Thename rbd.csi.ceph.com not found
issue only occurs some of nodes. Only one node can run the pod successfully. **driver-registrar logs **The log looks like normal. But
driver name rbd.csi.ceph.com not found in the list of registered CSI drivers
occurs.Found it! Plugins not being registered properly with kubelet was my assumption as well, but I didn’t know what to look for. Turns out, we had two inconsistent deployments between our clusters, so both /var/lib/kubelet and another persistent location were being deployed as root dir, but only one was actually used and I checked the wrong one, so everything looked fine when in fact it wasn’t. Thanks for the hint, now it’s working!
Yes, it really looks like as you said, the pod has been bound with a pv successfully, we should update the content about this issue in order to avoid make others users be confused. https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/
Thank you ,much appreciated, BTW , Merry Christmas!
Here is the example https://github.com/ceph/ceph-csi/blob/4e82384b43478b4f034b41e90760ed1a9d962166/examples/rbd/storageclass.yaml#L30