kubernetes: Unable to restore volume specs in ASW when attach/detach controller starts

What happened?

Normally, DSW and ASW data are restored after the attach/detach controller is started. Get volumes from node.Status.VolumesAttached and add them to ASW. At this point, VolumeSpec is nil. It then goes through all the pods, adds the volumes in the pods that need to be attached to the DSW, and replaces the volumespec in the ASW with the correct volumes found on the pods. https://github.com/kubernetes/kubernetes/blob/master/pkg/controller/volume/attachdetach/attach_detach_controller.go

			attachState := adc.actualStateOfWorld.GetAttachState(volumeName, nodeName)
			if attachState == cache.AttachStateAttached {
				klog.V(10).Infof("Volume %q is attached to node %q. Marking as attached in ActualStateOfWorld",
					volumeName,
					nodeName,
				)
				devicePath, err := adc.getNodeVolumeDevicePath(volumeName, nodeName)
				if err != nil {
					klog.Errorf("Failed to find device path: %v", err)
					continue
				}
				err = adc.actualStateOfWorld.MarkVolumeAsAttached(volumeName, volumeSpec, nodeName, devicePath)
				if err != nil {
					klog.Errorf("Failed to update volume spec for node %s: %v", nodeName, err)
				}
			}

But I have a problem:
All nodes of k8s cluster are restarted and pods are evicted. When ADcontroller starts, the pod has not been successfully attached on the new node, so the node of the pod’s volume stored in ASW is still the old node, which is inconsistent with the pod’s nodename. According to Code logic, volume spec in AWS will not replace, always nil.

A full description of the problem I’m having is below： There is a k8s cluster with 3 masters and 2 nodes. There is a running pod1:ftpv3-5b688be6-a73c-488d-846a-155a3a32fedd-0 on node:193.168.0.5 that uses an attachable volume1:131ab496-40a5-47a0-8011-4554779c568. Restart all nodes of the k8s cluster. Then it was found that pod1 is not running due to the mount failure and can’t be recovered. The mount failed because the volumeattach could not be obtained during waitforattach.

Analyze the problem through the log, as follows:

At the beginning, pod1 is running, and the volume1 used by pod1 is attached to node5.
Restart all nodes of the k8s cluster. When node5 restarts, pod1 is evicted, so volume1 starts to detach from node5, but detaching volume1 fails. The reason is: csi-attacher does not receive the event of volumeattachment after waiting for 2m, and returns detach failure. The volume1/node5 for this pod was not removed from ASW due to detach failure, and volume was not remove from node.status.volumeattached, but VA was removed by csi-attacher during detach.

I0216 08:29:31.474989       1 event.go:291] "Event occurred" object="opcs/ftpv3-5b688be6-a73c-488d-846a-155a3a32fedd-0" kind="Pod" apiVersion="" type="Normal" reason="TaintManagerEviction" message="Marking for deletion Pod opcs/ftpv3-5b688be6-a73c-488d-846a-155a3a32fedd-0"
I0216 08:39:52.455014       1 reconciler.go:203] attacherDetacher.DetachVolume started for volume "nil" (UniqueName: "kubernetes.io/csi/opdisk.csi.openpalette.org^131ab496-40a5-47a0-8011-4554779c5688") on node "193.168.0.5" 
E0216 08:41:52.497915       1 csi_attacher.go:500] kubernetes.io/csi: attachdetacher.WaitForDetach timeout after 2m0s [volume=131ab496-40a5-47a0-8011-4554779c5688; attachment.ID=csi-841b09548dbe5e57220ffa7ab34398a71821b25459d6725e3d3afe87e5148f24]
E0216 08:41:52.498031       1 nestedpendingoperations.go:319] Operation for "{volumeName:kubernetes.io/csi/opdisk.csi.openpalette.org^131ab496-40a5-47a0-8011-4554779c5688 podName: nodeName:193.168.0.5}" failed. No retries permitted until 2022-02-16 08:41:52.997986724 +0800 CST m=+321.870058193 (durationBeforeRetry 500ms). Error: "DetachVolume.Detach failed for volume \"nil\" (UniqueName: \"kubernetes.io/csi/opdisk.csi.openpalette.org^131ab496-40a5-47a0-8011-4554779c5688\") on node \"193.168.0.5\" : attachdetachment timeout for volume 131ab496-40a5-47a0-8011-4554779c5688"

Since nodes are restarted in sequence, during the detach process, pod1 has been evicted to other node and then back to the node5 again. When pod1 re-created on node5, volume1/node5 was added to DSW. Since volume1/node5 is stored in both DSW and ASW, both detach and attach will no longer be executed.
Additionally, the attach/detach controller periodically calls rc.attacherDetacher.VerifyVolumesAreAttached to check the attached status of the volume and fixes exceptions. https://github.com/kubernetes/kubernetes/blob/8dd52e6c5dcd8068ee1080a4b7dcabd0a140a5ee/pkg/controller/volume/attachdetach/reconciler/reconciler.go

func (rc *reconciler) syncStates() {
	volumesPerNode := rc.actualStateOfWorld.GetAttachedVolumesPerNode()
	rc.attacherDetacher.VerifyVolumesAreAttached(volumesPerNode, rc.actualStateOfWorld)
}

However, when attach/detach controller starts, the pod1 has been evicted, recreated on another node, but has not been successfully attached on the new node. So attachVolume.nodesAttachedTo in ASW only has node5, no new nodes. Checking the attached status of volume/pod.nodename in ASW will return false. Update volumespec not executed. The volume.spec is always nil. rc.attacherDetacher.VerifyVolumesAreAttached fails. E0216 08:38:20.647821 1 operation_executor.go:711] VerifyVolumesAreAttached: nil spec for volume kubernetes.io/csi/opdisk.csi.openpalette.org^131ab496-40a5-47a0-8011-4554779c5688

What did you expect to happen?

Volume specs in ASW are replace with the correct ones found on pods after attach/detach controller is started.

How can we reproduce it (as minimally and precisely as possible)?

All nodes of k8s cluster start and pods are evicted.

Anything else we need to know?

No response

Kubernetes version

$ kubectl version
# paste output here
v1.19.4

Cloud provider

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here

# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 18 (15 by maintainers)

Most upvoted comments

Here, before volume spec is updated, I think it is only necessary to check if the volumeName is in ASW and if this volume has an attached node, no need to check if pod.nodename is in the attached node. Am i right？ Have other ideas？ @NikhilSharmaWe @jsafrane @msau42

houjun41544 on Feb 21, 2022