kubernetes: Umount intree volume failed when csiMigrationRBD enabled.
What happened?
Volume couldn’t be umount successfuly for an intree rbd volume when csi migratation enabled.
With kubelet loglevel=4, can see the csi plugin volume path is unmounted.
Jan 29 07:41:36 node-10-200-112-221 kubelet[75663]: I0129 07:41:36.986617 75663 csi_mounter.go:463] kubernetes.io/csi: deleting volume path [/var/lib/kubelet/pods/c6aaef09-ce5b-44b2-944c-7a8a72bafb26/volumes/kubernetes.io~csi/pvc-a4b929be-9114-4d26-a15c-3a4fa40711b2]
Jan 29 07:41:36 node-10-200-112-221 kubelet[75663]: I0129 07:41:36.986649 75663 csi_mounter.go:392] kubernetes.io/csi: Unmounter.TearDownAt successfully unmounted dir [/var/lib/kubelet/pods/c6aaef09-ce5b-44b2-944c-7a8a72bafb26/volumes/kubernetes.io~csi/pvc-a4b929be-9114-4d26-a15c-3a4fa40711b2/mount]
Jan 29 07:41:36 node-10-200-112-221 kubelet[75663]: I0129 07:41:36.986658 75663 operation_generator.go:910] UnmountVolume.TearDown succeeded for volume "kubernetes.io/csi/rbd.csi.ceph.com^mig_mons-8b203a321c465f2b57b3c11c38e4644c_image-997efca5-80cd-11ec-b4c5-0a7342eb7f81_7475303239" (OuterVolumeSpecName: "www") pod "c6aaef09-ce5b-44b2-944c-7a8a72bafb26" (UID: "c6aaef09-ce5b-44b2-944c-7a8a72bafb26"). InnerVolumeSpecName "pvc-a4b929be-9114-4d26-a15c-3a4fa40711b2". PluginName "kubernetes.io/csi", VolumeGidValue ""
Jan 29 07:41:37 node-10-200-112-221 kubelet[75663]: I0129 07:41:37.083403 75663 reconciler.go:293] "operationExecutor.UnmountDevice started for volume \"pvc-a4b929be-9114-4d26-a15c-3a4fa40711b2\" (UniqueName: \"kubernetes.io/csi/rbd.csi.ceph.com^mig_mons-8b203a321c465f2b57b3c11c38e4644c_image-997efca5-80cd-11ec-b4c5-0a7342eb7f81_7475303239\") on node \"node-10-200-112-221\" "
The pv is still mounted to the intree plugin path(the line of rbd2)/var/lib/kubelet/pods/c6aaef09-ce5b-44b2-944c-7a8a72bafb26/volumes/kubernetes.io~rbd:
# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
rbd0 252:0 0 1G 0 disk /var/lib/kubelet/pods/b5545cd8-8b4b-43b8-80f6-0af1b3bbb189/volumes/kubernetes.io~csi/pvc-eec34c
rbd1 252:16 0 1G 0 disk /var/lib/kubelet/pods/88af1bae-7111-472f-b462-e8240d2a03d2/volumes/kubernetes.io~csi/pvc-7ec9ab
rbd2 252:32 0 1G 0 disk /var/lib/kubelet/pods/c6aaef09-ce5b-44b2-944c-7a8a72bafb26/volumes/kubernetes.io~rbd/pvc-a4b929
nvme0n1 259:0 0 64G 0 disk
├─nvme0n1p1 259:1 0 1M 0 part
├─nvme0n1p2 259:2 0 1G 0 part /boot
└─nvme0n1p3 259:3 0 63G 0 part
└─ubuntu--vg-ubuntu--lv
253:0 0 63G 0 lvm /
Because the volume still mounted in intree plugin, so umap failed.
Jan 29 07:41:37 node-10-200-112-221 kubelet[75663]: I0129 07:41:37.890779 75663 csi_client.go:458] kubernetes.io/csi: calling NodeUnstageVolume rpc [volid=mig_mons-8b203a321c465f2b57b3c11c38e4644c_image-997efca5-80cd-11ec-b4c5-0a7342eb7f81_7475303239,staging_target_path=/var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-a4b929be-9114-4d26-a15c-3a4fa40711b2/globalmount]
Jan 29 07:41:38 node-10-200-112-221 kubelet[75663]: E0129 07:41:38.171151 75663 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/csi/rbd.csi.ceph.com^mig_mons-8b203a321c465f2b57b3c11c38e4644c_image-997efca5-80cd-11ec-b4c5-0a7342eb7f81_7475303239 podName: nodeName:}" failed. No retries permitted until 2022-01-29 07:41:39.171129913 +0100 CET m=+331.218776784 (durationBeforeRetry 1s). Error: UnmountDevice failed for volume "pvc-a4b929be-9114-4d26-a15c-3a4fa40711b2" (UniqueName: "kubernetes.io/csi/rbd.csi.ceph.com^mig_mons-8b203a321c465f2b57b3c11c38e4644c_image-997efca5-80cd-11ec-b4c5-0a7342eb7f81_7475303239") on node "node-10-200-112-221" : kubernetes.io/csi: attacher.UnmountDevice failed: rpc error: code = Internal desc = rbd: unmap for spec (tu029/kubernetes-dynamic-pvc-997efca5-80cd-11ec-b4c5-0a7342eb7f81) failed (an error (exit status 16) occurred while running rbd args: [unmap tu029/kubernetes-dynamic-pvc-997efca5-80cd-11ec-b4c5-0a7342eb7f81 --device-type krbd --options noudev]): (rbd: sysfs write failed
Jan 29 07:41:38 node-10-200-112-221 kubelet[75663]: I0129 07:41:38.281225 75663 kubelet_volumes.go:92] "Pod found, but volumes are still mounted on disk" podUID=c6aaef09-ce5b-44b2-944c-7a8a72bafb26 paths=[/var/lib/kubelet/pods/c6aaef09-ce5b-44b2-944c-7a8a72bafb26/volumes/kubernetes.io~rbd/pvc-a4b929be-9114-4d26-a15c-3a4fa40711b2
If manually do “umount /dev/rbd0”, it worked.
What did you expect to happen?
The volume created by intree plugin could be umounted by csi.
How can we reproduce it (as minimally and precisely as possible)?
- Create statefulset with intree rbd plugin.
- Enable csiMigrationRBD and deploy ceph csi 3.5.1.
- Scale-down statefulset, found that the pod stuck in Terminating, as the volume failed to detach.
Anything else we need to know?
No response
Kubernetes version
# kubectl version
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:25:17Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.3", GitCommit:"816c97ab8cff8a1c72eccca1026f7820e93e0d25", GitTreeState:"clean", BuildDate:"2022-01-25T21:19:12Z", GoVersion:"go1.17.6", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
Private cluster
OS version
# On Linux:
$ cat /etc/os-release
NAME="Ubuntu"
VERSION="18.04.5 LTS (Bionic Beaver)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 18.04.5 LTS"
VERSION_ID="18.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=bionic
UBUNTU_CODENAME=bionic
$ uname -a
Linux node-10-210-171-221 5.4.0-70-generic #78~18.04.1-Ubuntu SMP Sat Mar 20 14:10:07 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and and version (if applicable)
Containerd
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 30 (16 by maintainers)
Hi @Jiawei0227, it did work by rolling upgrade as you suggested. Thank you. @humblec, Thanks a lot for your time to follow up this issue.
kubelet-pod-before-migration.log kubelet-pod-after-migration.log
Hi @humblec , I collected the kubelet log that pod was created before or after migration enabled.
For the pod created after migration, then scale-down sts, the volume detached successfully, Log as below:
For pod created before migration, volume failed to umount/detach, log as belwo:
The difference seems after NodeUnstageVolume.
This PVC is not shared, used by only this pod .
I had a retest, the pod uid is changed.
Before I scale-dwon the pod, the pv mountpoint is as below.
Then I delete the pod, the csi mountpoint is removed, the left is the rbd mountpoint.
If I manually umount the mountpoint, then the pod is automatically deleted successfully.
The steps I did as below:
I didn’t drain the pods before enabling the migration.
Yes, had tested that created a PVC before migration enabled and attached to a pod after enabling the migration, and this worked well.
@humblec, found that the difference from the demo was that the pod in my scenario was created before csiMigrationRBD was enabled. And just had a test if I create the statefulset after csiMigrationRBD the umount also successfully.
Thanks @humblec, Here is the sidecar images.
Yes, I set csiMigrationRBD feature Gate enabled. Actually I raised this ticket 107488 😃
Will collect and upload the log soon~