external-attacher: VolumeAttachment not marked as detached causes problems when the Node is deleted.

In https://github.com/kubernetes-csi/external-attacher/pull/184, we had decided that instead of marking the VolumeAttachment as detached, we would just requeue the volume to have the workqueue process it again.

However, this doesn’t work in the case where the Node is deleted. In that scenario:

  1. ListVolumes() shows that volume is not attached to the node anymore
  2. ReconcileVA() sets force sync
  3. syncAttach() just tries to reattach the volume again and fails because node is gone
  4. In k/k AD controller, we try to attach to new node, but it fails on the multi-attach check because volume is still attached in asw.

What should happen is:

  1. ListVolumes() shows that volume is not attached to the node anymore
  2. We actually mark VolumeAttachment.status.attached as detached
  3. In k/k AD controller, VerifyVolumesAttached() sees that VolumeAttachment is detached, updates asw
  4. AD reconciler allows new Attach on new node to proceed.

I’m not sure the best way to fix step 2). Some suggestions I have in order of preference:

  1. We go back to actually updating VolumeAttachment in ReconcileVA() like the original PR did. But we call markAsDetached to make sure we update everything properly.
  2. We pass some more state to syncVA() so that it can markAsDetached if csiAttach failed on the force sync.

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 30
  • Comments: 25 (6 by maintainers)

Most upvoted comments

Any news on this, this is really annoying as there are so many opportunities where nodes are deleted:

  • by the cluster autoscaler
  • during cluster upgrade
  • … And then we always have to manually fix dozens of pods which cannot start as they cannot attach their volumes.

Facing the same issue with the latest EKS and ebs csi controller

@msau42 this happens every now and then in our clusters and 6 minutes is a long delay in a world with 99.95% and more uptime SLAs. Also note, that we already drain the nodes and still it might happen that a node terminates immediately without prober draining. That’s reality and not just theory, and we have to deal with it. That’s why we need resilient and self healing controllers, which make sure the system recovers from such error automatically within reasonable time. After all this is the reason why everybody is moving to kubernetes. If we still want to live on assumptions and fix everything manually whenever our overly optimistic assumptions fail, then we don’t need to maintain complex kubernetes systems 😁.

Still ran into this running Kubernetes v1.21.3. The node was deleted and the VolumeAttachment was still around specifying the old node name.

Facing the issue with Kubernetes v1.24 as well.

@jsafrane what do you think about that issue? My organization can help with the implementation