kubernetes: Kubelet VolumeManager incorrectly marks volume as not "in use" (i.e. safe to detach) while a mount or format is pending

What happened: While debugging an issue with a volume getting detached prematurely, @jingxu97 discovered an issue: the Kubelet VolumeManager incorrectly marks volume as not “in use” (i.e. safe to detach) while a mount or format is pending.

For context, attach/detach controller will not proceed with a detach (after pod referencing a volume is deleted) while the volume is still marked as “in use” by kubelet. Kubelet marks a volume as “in use” before it starts mounting or formating the volume.

The problem is:

  • If a pod referencing a volume is deleted while a mount (or format) is pending, then kubelet will go ahead and remove that pod from its “desired state of the world” cache (normally this waits for pod’s containers to terminate, but in this case the containers never even started because mount is still pending, i.e. format, is still pending).
  • Kubelet volume_manager’s GetVolumesInUse() method (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/volumemanager/volume_manager.go#L293) reports the volume as no longer in use because it is not in desired or actual state (it does not check pending operations <-- this is the bug).
  • Attach/detach controller sees the pod delete, checks and sees the volume is not marked as “in use” and proceeds with the detach (incorrectly), but a mount (or format) operation may be pending.

What you expected to happen:

Kubelet volume_manager’s GetVolumesInUse() method (https://github.com/kubernetes/kubernetes/blob/master/pkg/kubelet/volumemanager/volume_manager.go#L293) should count volumes that have pending operations (mount or otherwise) as “in use” (even if they are not in actual or desired state).

How to reproduce it (as minimally and precisely as possible):

While mount is pending (because format is taking a long time, for example), delete the pod referencing the attachable volume.

Anything else we need to know?: /sig storage /milestone 1.13

Environment:

  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • OS (e.g. from /etc/os-release):
  • Kernel (e.g. uname -a):
  • Install tools:
  • Others:

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 24 (24 by maintainers)

Most upvoted comments

@mlmhl yeah you are correct. This is basically same as issue @jingxu97 logged awhile back - https://github.com/kubernetes/kubernetes/issues/40887 .

I was able to reproduce this issue by introducing a time.Sleep in Format step.

the volume still is in actual_state_of_world and hence should be reported as in-use by the node, because that function considers both desired and actual state of the world

Currently VolumeManager only considers globally mounted volumes as in-use: https://github.com/kubernetes/kubernetes/blob/a856c7ab1de1bed266cbeab5dec703105d1bf2d0/pkg/kubelet/volumemanager/volume_manager.go#L298, but VerifyControllerAttachedVolume sets globallyMounted to false when adding the volume to ASW: https://github.com/kubernetes/kubernetes/blob/a856c7ab1de1bed266cbeab5dec703105d1bf2d0/pkg/kubelet/volumemanager/cache/actual_state_of_world.go#L404, so after VerifyControllerAttachedVolume finished, volume is still considered as not in-use.

In my previous comments, I suggested to consider all attached volumes in ASW as in-use, not only globally mounted volumes, but I’m not sure if it covers all situations.

Fixing this in a patch release is fine. This is an issue that has existed in kubernetes for many releases now.