ceph-csi: rbd command stuck inside the csi-nodeplugin/csi-rbdplugin container
Describe the bug
while using multus network for rook-ceph cluster, we use multus instead of host network for the csi daemon-set pods. We are able to create PV and PVC which are in bound state. but the rbd command to map the device gets stuck inside the csi-rbdplugin container and the pod consuming the pvc is not able to mount it and fails to get created.
Environment details
- Image/version of Ceph CSI driver : v3.0.0
- Helm chart version :
- Kernel version : 4.18.0-193.14.3.el8_2.x86_64
- Mounter used for mounting PVC (for cephfs its
fuse
orkernel
. for rbd itskrbd
orrbd-nbd
) : krbd - Kubernetes cluster version : v1.18.3 (OCP 4.5.5)
- Ceph cluster version : v15.2.4
Steps to reproduce
Steps to reproduce the behavior:
- Setup details: Create Network Attachment Definitions as follows.
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: public
namespace: rook-ceph
spec:
config: '{
"cniVersion": "0.3.1",
"type": "macvlan",
"master": "ens192",
"mode": "bridge",
"ipam": {
"type": "whereabouts",
"range": "192.168.231.0/24"
}
}'
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: cluster
namespace: rook-ceph
spec:
config: '{
"cniVersion": "0.3.1",
"type": "macvlan",
"master": "ens192",
"mode": "bridge",
"ipam": {
"type": "whereabouts",
"range": "192.168.232.0/24"
}
}'
- Then follow the steps to create a rook-ceph cluster to use multus network.
- Create CephBlockPool and storage class
- Create PVC
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: rbd-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
storageClassName: rook-ceph-block
- Create a pod consuming the above pvc
---
apiVersion: v1
kind: Pod
metadata:
name: csirbd-demo-pod
annotations:
k8s.v1.cni.cncf.io/networks: public
spec:
containers:
- name: web-server
image: nginx
volumeMounts:
- name: mypvc
mountPath: /var/lib/www/html
volumes:
- name: mypvc
persistentVolumeClaim:
claimName: rbd-pvc
readOnly: false
- Check the PVC
$oc get pvc rbd-pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
rbd-pvc Bound pvc-88ddf238-4add-4164-8bea-60601b89f146 1Gi RWO rook-ceph-block 43s
- Check the pod events
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled <unknown> default-scheduler Successfully assigned rook-ceph/csirbd-demo-pod to compute-0
Normal SuccessfulAttachVolume 4m3s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-88ddf238-4add-4164-8bea-60601b89f146"
Warning FailedMount 2m1s kubelet, compute-0 MountVolume.MountDevice failed for volume "pvc-88ddf238-4add-4164-8bea-60601b89f146" : rpc error: code = DeadlineExceeded desc = context deadline exceeded
Warning FailedMount 2m kubelet, compute-0 Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc default-token-mnp2d]: timed out waiting for the condition
Warning FailedMount 57s (x7 over 2m1s) kubelet, compute-0 MountVolume.MountDevice failed for volume "pvc-88ddf238-4add-4164-8bea-60601b89f146" : rpc error: code = Aborted desc = an operation with the given Volume ID 0001-0009-rook-ceph-0000000000000002-dfb6f67c-d728-11ea-8e7d-0a580a81020d already exists
- Exec into the csi-rbdplugin container on the node where the pod failed and check for stuck rbd commands.
# ps aux | grep " rbd "
root 114328 0.0 0.1 335948 22876 ? Sl 14:35 0:00 rbd --id csi-rbd-node -m 172.30.125.128:6789,172.30.2.12:6789,172.30.200.109:6789 --keyfile=/tmp/csi/keys/keyfile-206266584 map replicapool/csi-vol-dfb6f67c-d728-11ea-8e7d-0a580a81020d --device-type krbd
Actual results
The rbd map command got stuck inside the csi-rbdplugin container and the pod failed to get created.
Expected behavior
The pod should have mounted the PVC and should get created.
Logs
If the issue is in PVC mounting please attach complete logs of below containers.
- csi-rbdplugin container log
- driver-registrar container log
oc logs csi-rbdplugin-grdqk driver-registrar
I0805 14:18:11.323819 54428 main.go:110] Version: v1.2.0-0-g6ef000ae
I0805 14:18:11.324089 54428 connection.go:151] Connecting to unix:///csi/csi.sock
I0805 14:18:18.631197 54428 node_register.go:58] Starting Registration Server at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
I0805 14:18:18.631374 54428 node_register.go:67] Registration Server started at: /registration/rook-ceph.rbd.csi.ceph.com-reg.sock
I0805 14:18:19.521439 54428 main.go:77] Received GetInfo call: &InfoRequest{}
I0805 14:18:19.576220 54428 main.go:87] Received NotifyRegistrationStatus call: &RegistrationStatus{PluginRegistered:true,Error:,}
Additional context
While the rbd
command is stuck, the device is already mapped.
# lsblk | grep rbd
rbd0 252:0 0 1G 0 disk
After we kill the command, the pod is able to mount the volume and gets created.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (6 by maintainers)
Commits related to this issue
- rbd: support for network namespaces (Multus CNI) Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping and unmap... — committed to idryomov/ceph-csi by idryomov 4 years ago
- rbd: add support for network namespaces (Multus CNI) Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping and u... — committed to idryomov/ceph-csi by idryomov 4 years ago
- rbd: add support for network namespaces (Multus CNI) Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping and u... — committed to idryomov/ceph-csi by idryomov 4 years ago
- rbd: enable mapping and unmapping from a network namespace Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping... — committed to idryomov/ceph-csi by idryomov 4 years ago
- rbd: enable mapping and unmapping from a network namespace Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping... — committed to idryomov/ceph-csi by idryomov 4 years ago
- rbd: enable mapping and unmapping from a network namespace Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping... — committed to idryomov/ceph-csi by idryomov 4 years ago
- rbd: enable mapping and unmapping from a network namespace Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping... — committed to idryomov/ceph-csi by idryomov 4 years ago
- rbd: enable mapping and unmapping from a network namespace Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping... — committed to idryomov/ceph-csi by idryomov 4 years ago
- rbd: enable mapping and unmapping from a network namespace Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping... — committed to idryomov/ceph-csi by idryomov 4 years ago
- rbd: enable mapping and unmapping from a network namespace Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping... — committed to ceph/ceph-csi by idryomov 4 years ago
- rbd: enable mapping and unmapping from a network namespace Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping... — committed to ceph/ceph-csi by idryomov 4 years ago
- rbd: enable mapping and unmapping from a network namespace Make rbdplugin pod work in a non-initial network namespace (i.e. with "hostNetwork: false") by skipping waiting for udev events when mapping... — committed to nixpanic/ceph-csi by idryomov 4 years ago
I’ve filed https://tracker.ceph.com/issues/47128. I’ll try to post a PR this week.
I’ll work on a change to
rbd map
to optionally skip waiting for udev events. Should be very simple to do, but I’m not sure how well it would work yet.This is the expected behaviour and has to do with how udev works.
rbd map
andrbd unmap
listen for udev events and block until a specific udev event is seen, even when the actual block device is already there. Unfortunately, udev events don’t cross network namespace boundaries, so--net=host
is more or less required here.We can probably work around this either by ditching udev for the CSI use case or by manually dispatching udev events into the pod network namespace, but neither option is appealing. Given that this pod will always require at least
CAP_SYS_ADMIN
for creating block devices, is--net=host
really a big deal?