kubernetes: the volume is not detached after the pod and PVC objects are deleted
What happened?
Hello, I’ve recently been testing the reliability of storage for k8s v1.25.0. A strange phenomenon occurs occasionally during the test: the volume cannot be detached after the PVC and pod are deleted(the reclaimPolicy of pv is delete).
So I first went to check the corresponding pv and volumeAttachment objects and found that they both existed. Then I checked the kube-controllerManager and kubelet logs and found that kubelet logs kept printing:
I1125 10:28:42.931263 3908451 reconciler.go:389] "Starting operationExecutor.UnmountDevice for volume \"pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db\" (UniqueName: \"kubernetes.io/csi/xxx^f2ee34c4-f297-49cb-8730-4442ceb1df4f\") on node \"192.168.x.x\" "
But I found that at the beginning, the volume was unmountDevice succeed, as shown in the following kubelet log:
I1125 10:20:43.948800 3908451 reconciler.go:389] "Starting operationExecutor.UnmountDevice for volume \"pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db\" (UniqueName: \"kubernetes.io/csi/xxx^f2ee34c4-f297-49cb-8730-4442ceb1df4f\") on node \"192.168.x.x\" "
I1125 10:20:43.948819 3908451 reconciler.go:396] "operationExecutor.UnmountDevice started for volume \"pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db\" (UniqueName: \"kubernetes.io/csi/xxx^f2ee34c4-f297-49cb-8730-4442ceb1df4f\") on node \"192.168.x.x\" "
I1125 10:20:44.356818 3908451 operation_generator.go:1013] UnmountDevice succeeded for volume "pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db" (UniqueName: "kubernetes.io/csi/xxx^f2ee34c4-f297-49cb-8730-4442ceb1df4f") on node "192.168.x.x"
it is so wired !
Then I added some log variables to the unmountDetachDevices method:
func (rc *reconciler) unmountDetachDevices() {
for _, attachedVolume := range rc.actualStateOfWorld.GetUnmountedVolumes() {
// Check IsOperationPending to avoid marking a volume as detached if it's in the process of mounting.
klog.InfoS("===================condition1=================", !rc.desiredStateOfWorld.VolumeExists(attachedVolume.VolumeName, attachedVolume.SELinuxMountContext),"volumeName", attachedVolume.VolumeName)
klog.InfoS("=====================condition2===============", !rc.operationExecutor.IsOperationPending(attachedVolume.VolumeName, nestedpendingoperations.EmptyUniquePodName, nestedpendingoperations.EmptyNodeName), "volumeName", attachedVolume.VolumeName)
if !rc.desiredStateOfWorld.VolumeExists(attachedVolume.VolumeName, attachedVolume.SELinuxMountContext) &&
!rc.operationExecutor.IsOperationPending(attachedVolume.VolumeName, nestedpendingoperations.EmptyUniquePodName, nestedpendingoperations.EmptyNodeName) {
klog.InfoS("attachedVolume.DeviceMountState", attachedVolume.DeviceMountState, "volumeName", attachedVolume.VolumeName)
if attachedVolume.DeviceMayBeMounted() {
...
}
}
}
func (asw *actualStateOfWorld) MarkDeviceAsUnmounted(
volumeName v1.UniqueVolumeName) error {
klog.InfoS("==================MarkDeviceAsUnmounted volumeName===================", volumeName)
err := asw.SetDeviceMountState(volumeName, operationexecutor.DeviceNotMounted, "", "", "")
if err!=nil{
klog.InfoS("==================unmount err appear===================", volumeName)
}
return err
}
The results show that
I1125 10:20:43.948776 3908451 reconciler.go:381] "===================condition1=================" %!s(bool=true)="volumeName" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
I1125 10:20:43.948783 3908451 reconciler.go:382] "=====================condition2===============" %!s(bool=true)="volumeName" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
I1125 10:20:43.948789 3908451 reconciler.go:386] "attachedVolume.DeviceMountState" DeviceGloballyMounted="volumeName" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
I1125 10:20:43.948800 3908451 reconciler.go:389] "Starting operationExecutor.UnmountDevice for volume \"pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db\" (UniqueName: \"kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f\") on node \"192.168.3.181\" "
I1125 10:20:43.948819 3908451 reconciler.go:396] "operationExecutor.UnmountDevice started for volume \"pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db\" (UniqueName: \"kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f\") on node \"192.168.3.181\" "
I1125 10:20:44.356818 3908451 operation_generator.go:1013] UnmountDevice succeeded for volume "pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db" (UniqueName: "kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f") on node "192.168.3.181"
I1125 10:20:44.356824 3908451 actual_state_of_world.go:510] "==================MarkDeviceAsUnmounted volumeName===================" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
################################### pay attention to this log timestamp ###################################
I1125 10:20:44.237720 3908451 reconciler.go:381] "===================condition1=================" %!s(bool=true)="volumeName" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
I1125 10:20:44.356906 3908451 reconciler.go:382] "=====================condition2===============" %!s(bool=true)="volumeName" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
################################### pay attention to this DeviceMountState variable ###################################
I1125 10:20:44.356914 3908451 reconciler.go:386] "attachedVolume.DeviceMountState" DeviceGloballyMounted="volumeName" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
I1125 10:20:44.356926 3908451 reconciler.go:389] "Starting operationExecutor.UnmountDevice for volume \"pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db\" (UniqueName: \"kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f\") on node \"192.168.3.181\" "
I1125 10:20:44.357045 3908451 reconciler.go:396] "operationExecutor.UnmountDevice started for volume \"pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db\" (UniqueName: \"kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f\") on node \"192.168.3.181\" "
################################### Error ###################################
################################### pay attention to persistentvolumes property ###################################
E1125 10:20:44.902890 3908451 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f podName: nodeName:}" failed. No retries permitted until 2022-11-25 10:20:45.402877857 +0800 CST m=+777.855686189 (durationBeforeRetry 500ms). Error: UnmountDevice failed for volume "pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db" (UniqueName: "kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f") on node "192.168.3.181" : persistentvolumes "e6104b10fea107e004e3307b0444fd7b775fedd804ab47aa997118b7927c32e0" is forbidden: User "system:node:192.168.3.181" cannot get resource "persistentvolumes" in API group "" at the cluster scope: no relationship found between node '192.168.3.181' and this object
I1125 10:20:44.961066 3908451 reconciler.go:381] "===================condition1=================" %!s(bool=true)="volumeName" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
I1125 10:20:44.961083 3908451 reconciler.go:382] "=====================condition2===============" %!s(bool=true)="volumeName" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
I1125 10:20:44.961090 3908451 reconciler.go:386] "attachedVolume.DeviceMountState" DeviceMountUncertain="volumeName" kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f="(MISSING)"
I1125 10:20:44.961104 3908451 reconciler.go:389] "Starting operationExecutor.UnmountDevice for volume \"pvc-1a4bcff1-f11d-456c-b91a-1aaa5c3ea5db\" (UniqueName: \"kubernetes.io/csi/xxx.csi.xxx.io^f2ee34c4-f297-49cb-8730-4442ceb1df4f\") on node \"192.168.3.181\" "
And I’ve drawn up a time-series chart of the contents of the log.

Let’s go back to the unmountDetachDevices method.
func (rc *reconciler) unmountDetachDevices() {
for _, attachedVolume := range rc.actualStateOfWorld.GetUnmountedVolumes() {
// Check IsOperationPending to avoid marking a volume as detached if it's in the process of mounting.
if !rc.desiredStateOfWorld.VolumeExists(attachedVolume.VolumeName, attachedVolume.SELinuxMountContext) &&
!rc.operationExecutor.IsOperationPending(attachedVolume.VolumeName, nestedpendingoperations.EmptyUniquePodName, nestedpendingoperations.EmptyNodeName) {
if attachedVolume.DeviceMayBeMounted() {
...
}
}
}
According to the above-mentioned log content, the first unmountDetachDevices operation successfully changes DeviceMountState of the corresponding attachedVolume to DeviceNotMounted, but the attachedVolume.DeviceMayBeMounted() function in the subsequent unmountDetachDevices operation still uses stale DeviceMountState.
In my opinion:
- Ensure that the unmountDevice operation is not performed after the first unmountDevice operation succeeds.
- If multiple unmountDevice operations are allowed, ensure that the correct pvname can be obtained in the
getDriverAndVolNameFromDeviceMountPathmethod.
What did you expect to happen?
the volume is detached after the pod and PVC objects are deleted
How can we reproduce it (as minimally and precisely as possible)?
This problem can be reproduced by repeatedly deleting and recreating pods that require PVC in k8s 1.25.0 cluster
Anything else we need to know?
No response
Kubernetes version
$ kubectl version
# paste output here
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 28 (15 by maintainers)
FYI. this issue is fixed in 1.24.14, 1.25.10, 1.26.5, and has been fixed in v1.27.0
@AbeOwlu I think they are the same issue and I have left some comments in #116847
I also hit this error after pvc and pod deleted on k8s 1.25:
and then these two volumes cannot be unmounted on the node forever:
I tried to delete pv, it’s stuck in terminating state forever