kubernetes: NodeUnstageVolume not called because unmounter fails when vol_data.json is deleted
What happened:
The upcoming csi-driver-host-path release v1.7.0 will have a check in DeleteVolume that returns an error when the volume is still attached, staged, or published (https://github.com/kubernetes-csi/csi-driver-host-path/pull/260).
Some of the jobs in the csi-driver-host-path repo with Kubernetes 1.21.0 are failing because NodeUnstageVolume is not called, causing DeleteVolume to fail repeatedly until the test times out.
One example:
volume ID 3f3a132b-b25c-11eb-9aba-a6b5a7ade690 in:
Corresponds to pvc-c0971c32-17c3-427f-a606-1bbf893caf89 in:
There’s one kubelet error that seems relevant:
May 11 13:30:23 csi-prow-worker2 kubelet[247]: E0511 13:30:23.786580 247 reconciler.go:193] "operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume \"test-volume\" (UniqueName: \"kubernetes.io/csi/hostpath.csi.k8s.io^ca29f6e4-b25c-11eb-8da9-1e598a3983d2\") pod \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\" (UID: \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\") : UnmountVolume.NewUnmounter failed for volume \"test-volume\" (UniqueName: \"kubernetes.io/csi/hostpath.csi.k8s.io^ca29f6e4-b25c-11eb-8da9-1e598a3983d2\") pod \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\" (UID: \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/vol_data.json]: open /var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/vol_data.json: no such file or directory" err="UnmountVolume.NewUnmounter failed for volume \"test-volume\" (UniqueName: \"kubernetes.io/csi/hostpath.csi.k8s.io^ca29f6e4-b25c-11eb-8da9-1e598a3983d2\") pod \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\" (UID: \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/vol_data.json]: open /var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/vol_data.json: no such file or directory"
What you expected to happen:
NodeUnstageVolume should be called.
How to reproduce it (as minimally and precisely as possible):
CSI_PROW_KUBERNETES_VERSION=1.21.0 CSI_PROW_TESTS=parallel CSI_SNAPSHOTTER_VERSION=v4.0.0 ./.prow.sh
in csi-driver-host-path and hope that it fails.
Alternatively, retest in https://github.com/kubernetes-csi/csi-driver-host-path/pull/289
Anything else we need to know?:
Random observation: in the two cases that I looked at, the affected volume was published twice for different pods.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 22 (21 by maintainers)
Commits related to this issue
- relax volume lifecycle checks by default The recently introduced "still in use" check revealed a bug in Kubernetes (https://github.com/kubernetes/kubernetes/issues/101911). While the check itself is ... — committed to pohly/csi-driver-host-path by pohly 3 years ago
- relax volume lifecycle checks by default The recently introduced "still in use" check revealed a bug in Kubernetes (https://github.com/kubernetes/kubernetes/issues/101911). While the check itself is ... — committed to pohly/csi-driver-host-path by pohly 3 years ago
- relax volume lifecycle checks by default The recently introduced "still in use" check revealed a bug in Kubernetes (https://github.com/kubernetes/kubernetes/issues/101911). While the check itself is ... — committed to pohly/csi-driver-host-path by pohly 3 years ago
So two possible fixes to investigate: