ceph-csi: rbd command stuck inside the csi-nodeplugin/csi-rbdplugin container

Describe the bug

while using multus network for rook-ceph cluster, we use multus instead of host network for the csi daemon-set pods. We are able to create PV and PVC which are in bound state. but the rbd command to map the device gets stuck inside the csi-rbdplugin container and the pod consuming the pvc is not able to mount it and fails to get created.

Environment details

  • Image/version of Ceph CSI driver : v3.0.0
  • Helm chart version :
  • Kernel version : 4.18.0-193.14.3.el8_2.x86_64
  • Mounter used for mounting PVC (for cephfs its fuse or kernel. for rbd its krbd or rbd-nbd) : krbd
  • Kubernetes cluster version : v1.18.3 (OCP 4.5.5)
  • Ceph cluster version : v15.2.4

Steps to reproduce

Steps to reproduce the behavior:

  1. Setup details: Create Network Attachment Definitions as follows.
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: public
  namespace: rook-ceph
spec:
  config: '{
                  "cniVersion": "0.3.1",
                  "type": "macvlan",
                  "master": "ens192",
                  "mode": "bridge",
                  "ipam": {
                      "type": "whereabouts",
                      "range": "192.168.231.0/24"
                  }
                }'
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: cluster
  namespace: rook-ceph
spec:
  config: '{
                  "cniVersion": "0.3.1",
                  "type": "macvlan",
                  "master": "ens192",
                  "mode": "bridge",
                  "ipam": {
                      "type": "whereabouts",
                      "range": "192.168.232.0/24"
                  }
                }'
  1. Then follow the steps to create a rook-ceph cluster to use multus network.
  2. Create CephBlockPool and storage class
  3. Create PVC
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: rbd-pvc
spec:
  accessModes:
  - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
  storageClassName: rook-ceph-block
  1. Create a pod consuming the above pvc
---
apiVersion: v1
kind: Pod
metadata:
  name: csirbd-demo-pod
  annotations:
    k8s.v1.cni.cncf.io/networks: public
spec:
  containers:
   - name: web-server
     image: nginx
     volumeMounts:
       - name: mypvc
         mountPath: /var/lib/www/html
  volumes:
   - name: mypvc
     persistentVolumeClaim:
       claimName: rbd-pvc
       readOnly: false
  1. Check the PVC
$oc get pvc rbd-pvc
NAME      STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
rbd-pvc   Bound    pvc-88ddf238-4add-4164-8bea-60601b89f146   1Gi        RWO            rook-ceph-block   43s
  1. Check the pod events
Events:
  Type     Reason                  Age                 From                     Message
  ----     ------                  ----                ----                     -------
  Normal   Scheduled               <unknown>           default-scheduler        Successfully assigned rook-ceph/csirbd-demo-pod to compute-0
  Normal   SuccessfulAttachVolume  4m3s                attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-88ddf238-4add-4164-8bea-60601b89f146"
  Warning  FailedMount             2m1s                kubelet, compute-0       MountVolume.MountDevice failed for volume "pvc-88ddf238-4add-4164-8bea-60601b89f146" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
  Warning  FailedMount             2m                  kubelet, compute-0       Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc default-token-mnp2d]: timed out waiting for the condition
  Warning  FailedMount             57s (x7 over 2m1s)  kubelet, compute-0       MountVolume.MountDevice failed for volume "pvc-88ddf238-4add-4164-8bea-60601b89f146" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-dfb6f67c-d728-11ea-8e7d-0a580a81020d already exists
  1. Exec into the csi-rbdplugin container on the node where the pod failed and check for stuck rbd commands.
# ps aux | grep " rbd "
root      114328  0.0  0.1 335948 22876 ?        Sl   14:35   0:00 rbd --id csi-rbd-node -m 172.30.125.128:6789,172.30.2.12:6789,172.30.200.109:6789 --keyfile=/tmp/csi/keys/keyfile-206266584 map replicapool/csi-vol-dfb6f67c-d728-11ea-8e7d-0a580a81020d --device-type krbd

Actual results

The rbd map command got stuck inside the csi-rbdplugin container and the pod failed to get created.

Expected behavior

The pod should have mounted the PVC and should get created.

Logs

If the issue is in PVC mounting please attach complete logs of below containers.

  • csi-rbdplugin container log
  • driver-registrar container log
oc logs csi-rbdplugin-grdqk driver-registrar
I0805 14:18:11.323819   54428 main.go:110] Version: v1.2.0-0-g6ef000ae
I0805 14:18:11.324089   54428 connection.go:151] Connecting to unix:///csi/csi.sock
I0805 14:18:18.631197   54428 node_register.go:58] Starting Registration Server at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
I0805 14:18:18.631374   54428 node_register.go:67] Registration Server started at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
I0805 14:18:19.521439   54428 main.go:77] Received GetInfo call: &InfoRequest{}
I0805 14:18:19.576220   54428 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}

Additional context

While the rbd command is stuck, the device is already mapped.

# lsblk | grep rbd
rbd0                         252:0    0    1G  0 disk

After we kill the command, the pod is able to mount the volume and gets created.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (6 by maintainers)

Commits related to this issue

Most upvoted comments

I’ve filed https://tracker.ceph.com/issues/47128. I’ll try to post a PR this week.

I’ll work on a change to rbd map to optionally skip waiting for udev events. Should be very simple to do, but I’m not sure how well it would work yet.

This is the expected behaviour and has to do with how udev works. rbd map and rbd unmap listen for udev events and block until a specific udev event is seen, even when the actual block device is already there. Unfortunately, udev events don’t cross network namespace boundaries, so --net=host is more or less required here.

We can probably work around this either by ditching udev for the CSI use case or by manually dispatching udev events into the pod network namespace, but neither option is appealing. Given that this pod will always require at least CAP_SYS_ADMIN for creating block devices, is --net=host really a big deal?