kubernetes: NodeUnstageVolume not called because unmounter fails when vol_data.json is deleted

What happened:

The upcoming csi-driver-host-path release v1.7.0 will have a check in DeleteVolume that returns an error when the volume is still attached, staged, or published (https://github.com/kubernetes-csi/csi-driver-host-path/pull/260).

Some of the jobs in the csi-driver-host-path repo with Kubernetes 1.21.0 are failing because NodeUnstageVolume is not called, causing DeleteVolume to fail repeatedly until the test times out.

One example:

volume ID 3f3a132b-b25c-11eb-9aba-a6b5a7ade690 in:

Corresponds to pvc-c0971c32-17c3-427f-a606-1bbf893caf89 in:

There’s one kubelet error that seems relevant:

May 11 13:30:23 csi-prow-worker2 kubelet[247]: E0511 13:30:23.786580     247 reconciler.go:193] "operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume \"test-volume\" (UniqueName: \"kubernetes.io/csi/hostpath.csi.k8s.io^ca29f6e4-b25c-11eb-8da9-1e598a3983d2\") pod \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\" (UID: \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\") : UnmountVolume.NewUnmounter failed for volume \"test-volume\" (UniqueName: \"kubernetes.io/csi/hostpath.csi.k8s.io^ca29f6e4-b25c-11eb-8da9-1e598a3983d2\") pod \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\" (UID: \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/vol_data.json]: open /var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/vol_data.json: no such file or directory" err="UnmountVolume.NewUnmounter failed for volume \"test-volume\" (UniqueName: \"kubernetes.io/csi/hostpath.csi.k8s.io^ca29f6e4-b25c-11eb-8da9-1e598a3983d2\") pod \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\" (UID: \"a6441c77-7c03-4d33-8a80-0a37d895f8a9\") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/vol_data.json]: open /var/lib/kubelet/pods/a6441c77-7c03-4d33-8a80-0a37d895f8a9/volumes/kubernetes.io~csi/pvc-c0971c32-17c3-427f-a606-1bbf893caf89/vol_data.json: no such file or directory"

What you expected to happen:

NodeUnstageVolume should be called.

How to reproduce it (as minimally and precisely as possible):

CSI_PROW_KUBERNETES_VERSION=1.21.0 CSI_PROW_TESTS=parallel CSI_SNAPSHOTTER_VERSION=v4.0.0 ./.prow.sh in csi-driver-host-path and hope that it fails.

Alternatively, retest in https://github.com/kubernetes-csi/csi-driver-host-path/pull/289

Anything else we need to know?:

Random observation: in the two cases that I looked at, the affected volume was published twice for different pods.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 22 (21 by maintainers)

Commits related to this issue

Most upvoted comments

So two possible fixes to investigate:

  1. Is it possible to not add volumes to dsw if the pod is terminating? cc @jingxu97
  2. Get rid of that RemoveAll call in orphaned volume cleanup