gcp-compute-persistent-disk-csi-driver: Disk failed to relink with udevadm

I’ve been running a database cluster with a statefulset for a number of months of GKE. Yesterday I upgraded to to regular channel version 1.16.13-gke.401 and today I found that that statefulset had its first pod with this in describe pod:

Events:
  Type     Reason       Age                   From                                                          Message
  ----     ------       ----                  ----                                                          -------
  Warning  FailedMount  29m (x19 over 97m)    kubelet, gke-cluster-f51460e-nodes-preemptible-n-593b5f7a-rej6  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[data stolon-secrets stolon-token-xxxxx]: timed out waiting for the condition
  Warning  FailedMount  9m15s (x45 over 98m)  kubelet, gke-cluster-f51460e-nodes-preemptible-n-593b5f7a-rej6  MountVolume.MountDevice failed for volume "pvc-2e9a43cc-3655-4d64-af9e-9b4643b629a2" : rpc error: code = Internal desc = Error when getting device path: error verifying GCE PD ("pvc-2e9a43cc-3655-4d64-af9e-9b4643b629a2") is attached: failed to find and re-link disk pvc-2e9a43cc-3655-4d64-af9e-9b4643b629a2 with udevadm after retrying for 3s: failed to trigger udevadm fix: udevadm --trigger requested to fix disk pvc-2e9a43cc-3655-4d64-af9e-9b4643b629a2 but no such disk was found
  Warning  FailedMount  4m10s (x9 over 90m)   kubelet, gke-cluster-f51460e-nodes-preemptible-n-593b5f7a-rej6  Unable to attach or mount volumes: unmounted volumes=[data], unattached volumes=[stolon-secrets stolon-token-xxxxx data]: timed out waiting for the condition

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 14
Comments: 49 (25 by maintainers)

Commits related to this issue

[CI] Add HPP for HyperShift/Kubevirt deployments This is being done as a temporary workaround for https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/608 that we're const... — committed to orenc1/hypershift by orenc1 2 years ago
[CI] Add HPP for HyperShift/Kubevirt deployments This is being done as a temporary workaround for https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/608 that we're const... — committed to orenc1/hypershift by orenc1 2 years ago
[CI] Add HPP for HyperShift/Kubevirt deployments This is being done as a temporary workaround for https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/608 that we're const... — committed to orenc1/hypershift by orenc1 2 years ago
[CI] Add HPP for HyperShift/Kubevirt deployments This is being done as a temporary workaround for https://github.com/kubernetes-sigs/gcp-compute-persistent-disk-csi-driver/issues/608 that we're const... — committed to orenc1/hypershift by orenc1 2 years ago

Most upvoted comments

Have same issue on pre-emptible node pool GKE 1.22.8-gke.200.

insider89 on Apr 29, 2022

i have the same problem on gke, it usually happens on preempted node with statefulset pvc template. it seems to be related to the node and i can always fix it by draining the node where the pod scheduled on.

lapcchan on Mar 23, 2022

Same issue on 1.22.8-gke.200.

Edit: same on 1.23.5-gke.1501

Mokto on May 8, 2022

I’m facing the same behavior starting from v1.22 using preemptible nodes. PVCs are bound, but no pod can start due to

MountVolume.MountDevice failed for volume "pvc-bcaacf4b-426b-48a9-9b98-82e9e3c9ad69" : rpc error: code = Internal desc = Error when getting device path: rpc error: code = Internal desc = error verifying GCE PD ("gke-dev-env-e02550df-d-pvc-bcaacf4b-426b-48a9-9b98-82e9e3c9ad69") is attached: failed to find and re-link disk gke-dev-env-e02550df-d-pvc-bcaacf4b-426b-48a9-9b98-82e9e3c9ad69 with udevadm after retrying for 3s: failed to trigger udevadm fix: udevadm --trigger requested to fix disk gke-dev-env-e02550df-d-pvc-bcaacf4b-426b-48a9-9b98-82e9e3c9ad69 but no such disk was found

All PVs have label pv.kubernetes.io/migrated-to: pd.csi.storage.gke.io, but storageClass is the old one standard with provisioner kubernetes.io/gce-pd

The solution is to scale down and up again the cluster.

Feature Compute Engine persistent disk CSI Driver is enabled and has the following versions: gce-pd-driver = gke.gcr.io/gcp-compute-persistent-disk-csi-driver:v1.4.0-gke.0 csi-driver-registrar = gke.gcr.io/csi-node-driver-registrar:v2.5.0-gke.0

kopaygorodsky on May 3, 2022

We are also hitting this with pre-emptible node pools, post pre-emption. GKE 1.22

Edit: Deleting the pods would get them stuck in terminating, deleteting with --force, removed them but on recreation they would fail the same way. Only way to fix this for us was to manually delete the instances and scale the node pool back up, to recreate the ndoes.

cristicbz on Apr 6, 2022

Just had the same issue today on GKE, also with preemptible nodes and Kubernetes v1.22. Like @shandyDev, when I look at the PVC, there’s a volume.kubernetes.io/selected-node annotation that is still pointing to the deleted node. The disk itself is not attached to any node.

pdfrod on May 3, 2022

I’m facing this issue on pre-emptible node pool, whever a node was preempt, the pods status is changing to ContainerCreating and stuck on mount device

Warning  FailedMount  4m51s (x167 over 5h29m)  kubelet  MountVolume.MountDevice failed for volume "pvc-5628bf1d-78c7-4819-8391-6e76a841234a" : rpc error: code = Internal desc = Error when getting device path: rpc error: code = Internal desc = error verifying GCE PD ("pvc-5628bf1d-78c7-4819-8391-6e76a841234a") is attached: failed to find and re-link disk pvc-5628bf1d-78c7-4819-8391-6e76a841234a with udevadm after retrying for 3s: failed to trigger udevadm fix: udevadm --trigger requested to fix disk pvc-5628bf1d-78c7-4819-8391-6e76a841234a but no such disk was found

any update yet on this issue?

duythinht on Mar 9, 2022

Hi @mattcary , are you aware of a workaround we might be able to use until the experimental fix is GA? 🙏 It’s pretty common for us to run into this issue, because we often detach and attach new PVCs to existing nodes (one PVC for each workload).

You should only be running into this problem if you’re detaching disks while IO is in flight. K8s doesn’t do that normally—it waits for a clean unmount—and so it’s only in a very particular situation where this happens (queuing or other behavior on the kubelet leading to the 6m force detach flow). I’m trying to be precise here about the distinction from detach and unmount. Unmounting while IO is in flight won’t case this problem; it may cause problems for the filesystem but from the block device level once the system unmounts there won’t be IO and so detach is safe. Hence sipmly terminating pods doesn’t cause this error.

Attaching and detaching PVCs to existing nodes will not normally cause this behavior. So that may mean something else is going on—in which case this fix isn’t going to help you.

To reiterate, you’re only hitting this problem if you’re detaching disks (either manually, or because of nodes getting into an unusually overloaded state) while there is active IO on the device. Normal PVC workflows do not cause this behavior as far as we’ve seen.

Of course you may have found a new corner case, so please post details of your problems. But I suspect that this is not your issue.

mattcary on Nov 14, 2022

Thanks, that confirms what we have found internally. GCE is pushing out a fix for a problem where if a PD is detached while there is in-flight IO, the node becomes poisioned and can no longer attach any PDs. The only solution is to delete the node.

The fix for this is being rolled out, but the process is slow and will not be across the whole fleet until the beginning of next year if I understand correctly.

mattcary on Nov 11, 2022

How can we identify the node that goes into the unusually overloaded state? Will the node state change to NotReady because the node has disk pressure?

Note: we monitored the node state, but none ever went into the NotReady state when we performed the testing.

It hasn’t shown up in the kubelet monitored bits. The thing to look for is in the attacher controller logs in the kube-controller for lots of attach/detach operations, maybe taking a long time (> 30s), and the attacher controller doing a force detach.

mattcary on Nov 17, 2022

The other half of the issue is why the PD was not able to be cleanly unmounted before detach. https://github.com/kubernetes/kubernetes/pull/110721 may help avoid that, but it seems like there may still be an issue of container teardown or unmount taking too long.

msau42 on Nov 11, 2022

We have seen some cases where the disk is attached to the host machine, but a deadlock is happening in the guest OS that is preventing the device from appearing in sysfs (which then explains the error messages above, where the device can’t be found).

However we have not been able to reproduce this in order to figure out the root cause. So any further clues you can give would be helpful.

The first thing is to identify if this is the same problem: does the PD fail to appear in /dev/disk/by-id? Is the device missing from lsscsi or /sys/class/scsi_disk? (it’s hard to positively identify a missing disk, all I’ve been able to do is count the number of scsi devices and see if it matches what is supposed to be attached to the VM, eg from gcloud compute instances describe).

If so, look in dmesg for something like “virtio_scsi: SCSI device 0 0 176 0 not found”. Look for stuck kworker threads (ie, kworker threads in state “D” in ps). If there’s such a stuck thread, look in /proc/<pid>/stack for something like “INFO: task kworker/6:9:2470111 blocked for more than 327 seconds” and long stack traces including __scsi_add_disk or __scsi_remove_disk.

If so, it’s a similar problem, and we’d appreciate more information on how this is happening, including your machine shape, number of attached devices, number of pods on the node, etc.

mattcary on Apr 6, 2022

By the time we get to the MountDevice error, I believe the Attach call should have succeded.

We have seem some strange udev problems in the past where udev missed an event or didn’t update its device cache. We’ve tried to workaround it by doing a “udev trigger” when things seem strange, but there’s potentially still some scenarios where that won’t help.

msau42 on Sep 15, 2020