vsphere-csi-driver: Volume can't be detached of deleted Node/VM

Is this a BUG REPORT or FEATURE REQUEST?: /kind bug

What happened: After draining and deleting a Node a volume of a pod is stuck in ContainerCreating phase because the volume can’t be mounted. Event: Unable to attach or mount volumes Logs of the controller show that it can’t detach the volume from the deleted vm (kubedev-worker-2bcf0b684616):

kubectl.exe logs -f vsphere-csi-controller-0 --all-containers --tail 30
I0515 11:18:14.322186       1 controller.go:198] Started VA processing "csi-ad8ca603308ed422a89d11ed4c09a448fc724c3998b0a134fe9a17100a5045c8"
I0515 11:18:14.322258       1 csi_handler.go:209] CSIHandler: processing VA "csi-ad8ca603308ed422a89d11ed4c09a448fc724c3998b0a134fe9a17100a5045c8"
I0515 11:18:14.322267       1 csi_handler.go:260] Starting detach operation for "csi-ad8ca603308ed422a89d11ed4c09a448fc724c3998b0a134fe9a17100a5045c8"
I0515 11:18:14.322336       1 csi_handler.go:267] Detaching "csi-ad8ca603308ed422a89d11ed4c09a448fc724c3998b0a134fe9a17100a5045c8"
I0515 11:18:14.322365       1 csi_handler.go:699] Can't get CSINode kubedev-worker-2bcf0b684616.kubernetes.mycorp.test: csinode.storage.k8s.io "kubedev-worker-2bcf0b684616.kubernetes.mycorp.test" not found
I0515 11:18:14.323158       1 csi_handler.go:575] Saving detach error to "csi-ad8ca603308ed422a89d11ed4c09a448fc724c3998b0a134fe9a17100a5045c8"
I0515 11:18:14.329068       1 controller.go:158] Ignoring VolumeAttachment "csi-ad8ca603308ed422a89d11ed4c09a448fc724c3998b0a134fe9a17100a5045c8" change
I0515 11:18:14.329599       1 csi_handler.go:586] Saved detach error to "csi-ad8ca603308ed422a89d11ed4c09a448fc724c3998b0a134fe9a17100a5045c8"
I0515 11:18:14.329626       1 csi_handler.go:219] Error processing "csi-ad8ca603308ed422a89d11ed4c09a448fc724c3998b0a134fe9a17100a5045c8": failed to detach: rpc error: code = Internal desc = Failed to find VirtualMachine for node:"kubedev-worker-2bcf0b684616.kubernetes.mycorp.test". Error: node wasn't found
E0515 11:18:14.322833       1 manager.go:129] Node not found with nodeName kubedev-worker-2bcf0b684616.kubernetes.mycorp.test
E0515 11:18:14.322963       1 controller.go:312] Failed to find VirtualMachine for node:"kubedev-worker-2bcf0b684616.kubernetes.mycorp.test". Error: node wasn't found

There is no Node, CSINode or VM with that name anymore. In the details of that PVC in the vSphere Container Volumes view there is no vm shown.

What you expected to happen: The volume can be attached.

How to reproduce it (as minimally and precisely as possible): Not sure, maybe the app didn’t shut down correctly and the volume couldn’t be removed before the node and vm was deleted

Anything else we need to know?: Where is the state stored and can the PV be detached manually somehow? I tried to recreate the PVC to reattache the same PV and restart the controller pod but it didn’t help. Is there something that I can to during the VM drain/shutdown to avoid this?

Environment:

csi-vsphere version: 1.0.2
vsphere-cloud-controller-manager version: 1.0.2
Kubernetes version: v1.17.5
vSphere version: 6.7u3
OS (e.g. from /etc/os-release): Ubuntu 18.04.4
Kernel (e.g. uname -a): 5.3.0-51-generic
Install tools: -
Others: -

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 21 (4 by maintainers)

Most upvoted comments

@marratj in our case the problem was that Disk Change Block Tracking (CBT) was enabled by our backup tool in the background but CBT was not enabled on the VM and the disk could not be attached again. To enable CBT see: https://kb.vmware.com/s/article/1031873

yvespp on Dec 8, 2020

@tgelter thanks!

mitchellmaler on Aug 9, 2021