ceph-csi: Fail to mount RBD PVC with error Internal desc = exit status 2

Describe the bug

I deployed ceph cluster using cephadm successfully. I can create StorageClass and pvc with CephFS , instead, RBD failed.

Environment details

  • Image/version of Ceph CSI driver : v1.3.0-0
  • Helm chart version : NO
  • Kernel version : 4.18.0-147.8.1.el8_1.x86_64
  • Mounter used for mounting PVC (for cephfs its fuse or kernel. for rbd its krbd or rbd-nbd) : krbd(? not sure)
  • Kubernetes cluster version : 1.18.4
  • Ceph cluster version : 15.2.4 octopus (stable)

Steps to reproduce

Steps to reproduce the behavior:

  1. Setup details: ‘deploy kubernetes cluster, install cephadm and deploy the cluster (3 nodes), try to use rbd following the Guide
  2. Deployment to trigger the issue Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[default-token-jn7tw data]: timed out waiting for the condition
  3. See error

Actual results

Pod status is still ContainerCreating,

Expected behavior

Pod can mount the PVC.

Logs

Pod event:

Normal   Scheduled               18m                  default-scheduler        Successfully assigned mars/pod-with-raw-block-volume to 10.0.0.202
  Normal   SuccessfulAttachVolume  18m                  attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4"
  Warning  FailedMount             9m30s (x2 over 11m)  kubelet, 10.0.0.202      Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data default-token-jn7tw]: timed out waiting for the condition
  Warning  FailedMapVolume         100s (x16 over 18m)  kubelet, 10.0.0.202      MapVolume.SetUpDevice failed for volume "pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4" : rpc error: code = Internal desc = exit status 2
  Warning  FailedMount             26s (x6 over 16m)    kubelet, 10.0.0.202      Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[default-token-jn7tw data]: timed out waiting for the condition

Kubelet log:

Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733346    3119 clientconn.go:106] parsed scheme: ""
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733358    3119 clientconn.go:106] scheme "" not registered, fallback to default scheme
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733374    3119 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock  <nil> 0 <nil>}] <nil> <nil>}
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733382    3119 clientconn.go:933] ClientConn switching balancer to "pick_first"
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.733423    3119 clientconn.go:882] blockingPicker: the picked transport is not ready, loop back to repick
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.736950    3119 clientconn.go:106] parsed scheme: ""
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.736960    3119 clientconn.go:106] scheme "" not registered, fallback to default scheme
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.736973    3119 passthrough.go:48] ccResolverWrapper: sending update to cc: {[{/var/lib/kubelet/plugins/rbd.csi.ceph.com/csi.sock  <nil> 0 <nil>}] <nil> <nil>}
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.736980    3119 clientconn.go:933] ClientConn switching balancer to "pick_first"
Jul 09 10:33:09 imagego-test-0002 kubelet[3119]: I0709 10:33:09.737004    3119 clientconn.go:882] blockingPicker: the picked transport is not ready, loop back to repick
Jul 09 10:33:10 imagego-test-0002 kubelet[3119]: E0709 10:33:10.086394    3119 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d podName: nodeName:}" failed. No retries permitted until 2020-07-09 10:35:12.086366695 +0800 CST m=+167228.245629512 (durationBeforeRetry 2m2s). Error: "MapVolume.SetUpDevice failed for volume \"pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d\") pod \"pod-with-raw-block-volume\" (UID: \"2b7a5741-2fee-4bc9-a440-c218112b686a\") : rpc error: code = Internal desc = exit status 2"
Jul 09 10:33:49 imagego-test-0002 kubelet[3119]: E0709 10:33:49.563137    3119 kubelet.go:1681] Unable to attach or mount volumes for pod "pod-with-raw-block-volume_mars(2b7a5741-2fee-4bc9-a440-c218112b686a)": unmounted volumes=[data], unattached volumes=[default-token-jn7tw data]: timed out waiting for the condition; skipping pod
Jul 09 10:33:49 imagego-test-0002 kubelet[3119]: E0709 10:33:49.563166    3119 pod_workers.go:191] Error syncing pod 2b7a5741-2fee-4bc9-a440-c218112b686a ("pod-with-raw-block-volume_mars(2b7a5741-2fee-4bc9-a440-c218112b686a)"), skipping: unmounted volumes=[data], unattached volumes=[default-token-jn7tw data]: timed out waiting for the condition
Jul 09 10:35:12 imagego-test-0002 kubelet[3119]: I0709 10:35:12.108777    3119 operation_generator.go:950] MapVolume.WaitForAttach entering for volume "pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d") pod "pod-with-raw-block-volume" (UID: "2b7a5741-2fee-4bc9-a440-c218112b686a") DevicePath ""
Jul 09 10:35:12 imagego-test-0002 kubelet[3119]: I0709 10:35:12.110941    3119 operation_generator.go:959] MapVolume.WaitForAttach succeeded for volume "pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d") pod "pod-with-raw-block-volume" (UID: "2b7a5741-2fee-4bc9-a440-c218112b686a") DevicePath "csi-fa9d3f5524a7cf5f62e8f3df5a8710426944870b866c57fd437624f1f2480c89"

csi-rbdplugin log:

I0709 02:06:23.305910   21535 utils.go:159] ID: 25 GRPC call: /csi.v1.Identity/Probe
I0709 02:06:23.305922   21535 utils.go:160] ID: 25 GRPC request: {}
I0709 02:06:23.306322   21535 utils.go:165] ID: 25 GRPC response: {}
I0709 02:06:38.310790   21535 utils.go:159] ID: 26 GRPC call: /csi.v1.Node/NodeGetCapabilities
I0709 02:06:38.310803   21535 utils.go:160] ID: 26 GRPC request: {}
I0709 02:06:38.311247   21535 utils.go:165] ID: 26 GRPC response: {"capabilities":[{"Type":{"Rpc":{"type":1}}},{"Type":{"Rpc":{"type":2}}},{"Type":{"Rpc":{"type":3}}}]}
I0709 02:06:38.314646   21535 utils.go:159] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d GRPC call: /csi.v1.Node/NodeStageVolume
I0709 02:06:38.314655   21535 utils.go:160] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d GRPC request: {"secrets":"***stripped***","staging_target_path":"/var/lib/kubelet/plugins/kubernetes.io/csi/volumeDevices/staging/pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4","volume_capability":{"AccessType":{"Block":{}},"access_mode":{"mode":1}},"volume_context":{"clusterID":"1e883026-c03d-11ea-9dd5-00163e101e9f","imageName":"csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d","journalPool":"rbd","pool":"rbd","storage.kubernetes.io/csiProvisionerIdentity":"1594195443622-8081-rbd.csi.ceph.com"},"volume_id":"0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d"}
I0709 02:06:38.315790   21535 rbd_util.go:806] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d setting disableInUseChecks on rbd volume to: false
I0709 02:06:38.316244   21535 omap.go:74] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d got omap values: (pool="rbd", namespace="", name="csi.volume.360233ac-c184-11ea-aa46-d6eec0877e2d"): map[csi.imageid:5ee19ac08555 csi.imagename:csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d csi.volname:pvc-72e30530-4472-451d-b7bb-cdf2d1cd9ac4]
E0709 02:06:38.316348   21535 util.go:250] kernel 4.18.0-147.8.1.el8_1.x86_64 does not support required features
I0709 02:06:38.358623   21535 rbd_util.go:322] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d executing [rbd task add flatten rbd/csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d --id kubernetes --keyfile=/tmp/csi/keys/keyfile-260375200 -m 10.0.0.201:6789,10.0.0.202:6789,10.0.0.203:6789] for image (csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d) using mon 10.0.0.201:6789,10.0.0.202:6789,10.0.0.203:6789, pool rbd
W0709 02:06:38.668018   21535 rbd_util.go:336] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d uncaught error while scheduling a task: exit status 2
E0709 02:06:38.668037   21535 rbd_util.go:464] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d failed to add task flatten for rbd/csi-vol-360233ac-c184-11ea-aa46-d6eec0877e2d : exit status 2
E0709 02:06:38.668142   21535 utils.go:163] ID: 27 Req-ID: 0001-0024-1e883026-c03d-11ea-9dd5-00163e101e9f-0000000000000002-360233ac-c184-11ea-aa46-d6eec0877e2d GRPC error: rpc error: code = Internal desc = exit status 2
  • ** dmesg logs.**
[26687.317120] rbd: loaded (major 251)
[76054.217590] IPv6: ADDRCONF(NETDEV_UP): cali13b71e8c17d: link is not ready
[76054.218040] IPv6: ADDRCONF(NETDEV_CHANGE): cali13b71e8c17d: link becomes ready
[85851.683017] libceph: mon2 10.0.0.203:6789 session established
[85851.683289] libceph: client34217 fsid 1e883026-c03d-11ea-9dd5-00163e101e9f
[85851.688494] rbd: rbd0: capacity 2147483648 features 0x1
[85852.512339] EXT4-fs (rbd0): mounted filesystem with ordered data mode. Opts: (null)
[85852.797739] IPv6: ADDRCONF(NETDEV_UP): cali4f1f3127bcd: link is not ready
[85852.798450] IPv6: ADDRCONF(NETDEV_CHANGE): cali4f1f3127bcd: link becomes ready
[91817.223254] libceph: mon1 10.0.0.202:6789 session established
[91817.223540] libceph: client24244 fsid 1e883026-c03d-11ea-9dd5-00163e101e9f

Additional context

There was another issue driver name rbd.csi.ceph.com not found in the list of registered CSI drivers yesterday. I read the issue #1105 then recreated the rbd-plugin pod, it works fine.

For example: Nothing.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 42

Most upvoted comments

This issue can resolved by adding imageFeatures: layering. The name rbd.csi.ceph.com not found issue only occurs some of nodes. Only one node can run the pod successfully. **driver-registrar logs **

I0709 03:19:46.396924   27642 main.go:110] Version: v1.3.0-0-g6e9fff3e
I0709 03:19:46.396985   27642 main.go:120] Attempting to open a gRPC connection with: "/csi/csi.sock"
I0709 03:19:46.397005   27642 connection.go:151] Connecting to unix:///csi/csi.sock
I0709 03:19:47.397487   27642 main.go:127] Calling CSI driver to discover driver name
I0709 03:19:47.397511   27642 connection.go:180] GRPC call: /csi.v1.Identity/GetPluginInfo
I0709 03:19:47.397518   27642 connection.go:181] GRPC request: {}
I0709 03:19:47.400319   27642 connection.go:183] GRPC response: {"name":"rbd.csi.ceph.com","vendor_version":"canary"}
I0709 03:19:47.400617   27642 connection.go:184] GRPC error: <nil>
I0709 03:19:47.400625   27642 main.go:137] CSI driver name: "rbd.csi.ceph.com"
I0709 03:19:47.400676   27642 node_register.go:51] Starting Registration Server at: /registration/rbd.csi.ceph.com-reg.sock
I0709 03:19:47.400762   27642 node_register.go:60] Registration Server started at: /registration/rbd.csi.ceph.com-reg.sock

The log looks like normal. But driver name rbd.csi.ceph.com not found in the list of registered CSI drivers occurs.

Found it! Plugins not being registered properly with kubelet was my assumption as well, but I didn’t know what to look for. Turns out, we had two inconsistent deployments between our clusters, so both /var/lib/kubelet and another persistent location were being deployed as root dir, but only one was actually used and I checked the wrong one, so everything looked fine when in fact it wasn’t. Thanks for the hint, now it’s working!

are you specifying the imageFeature in storageclass?

Yes, it really looks like as you said, the pod has been bound with a pv successfully, we should update the content about this issue in order to avoid make others users be confused. https://docs.ceph.com/en/latest/rbd/rbd-kubernetes/

Thank you ,much appreciated, BTW , Merry Christmas!