kubernetes: Unable to attach or mount volumes: timed out waiting for the condition

What happened?

PVC mount gets timed out when the number of parallel pod requests with pvc access goes past 400. This caused pod PVC mount retries(several of these in some cases.) and delays the pod startup.

3m1s Warning FailedMount pod/nginx-deployment-7c54456f-2sk89 Unable to attach or mount volumes: unmounted volumes=[stresstest-pvc kube-api-access-f2nnp], unattached volumes=[stresstest-pvc kube-api-access-f2nnp]: timed out waiting for the condition

What did you expect to happen?

PVC should get mounted on the pods without any failedMount messages.

How can we reproduce it (as minimally and precisely as possible)?

The issue can be easily reproduced on a 10 node cluster with Kubernetes deployment having 800 pod replicas asking to mount the same PVC.

Anything else we need to know?

This issue is related to an old bug 84169 that was closed saying the issue got fixed with k8s 1.17+ but it seems the issue still persists.

Kubernetes version

$ kubectl version
# paste output here
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:41:01Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.8", GitCommit:"7061dbbf75f9f82e8ab21f9be7e8ffcaae8e0d44", GitTreeState:"clean", BuildDate:"2022-03-16T14:04:34Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}

Cloud provider

onprem installation with kubespray

OS version

# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
Ubuntu 20.04
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here

Install tools

Container runtime (CRI) and version (if applicable)

Related plugins (CNI, CSI, …) and versions (if applicable)

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 18
  • Comments: 45 (3 by maintainers)

Most upvoted comments

We performed the fix from this blog post, which worked for us: https://blog.devgenius.io/when-k8s-pods-are-stuck-mounting-large-volumes-2915e6656cb8

We started to encounter this after upgrade to eks 1.23, using a gp2 storageclass to host Victoria Metrics. Tried @ timvandruenen fix, didn’t work. Also tried to upgrade add-ons to the latest version, same.

Resolved: in our case we lacked an iam policy: arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy.

I found the same error while upscaling nodes for airflow processing 😢

@Jeaniowang Why say so many “报告收到”? This has contaminated my mailing list.

报告收到

I found the same error while upscaling nodes for airflow processing 😢

Yes, we are running k8s for Airflow too. The problem just slows up Airflow task init so much 😦(

报告收到

报告收到

报告收到

报告收到

报告收到

报告收到

报告收到

报告收到

报告收到

报告收到

I think I have the same issue here in my AKS cluster running Kube v1.21.7 (I know it’s already eol, I still have to schedule an update)

      volumes:
        - name: secrets
          secret:
            secretName: secret-appsettings

Running kubectl get events -n my-namespace --sort-by='.metadata.creationTimestamp'

I get this output

 LAST SEEN   TYPE      REASON        OBJECT                                                     MESSAGE
7m18s       Warning   FailedMount   pod/my-namespace-engine-api-deployment-595bf79b79-h27xj   MountVolume.SetUp failed for volume "secrets" : secret "secret-appsettings" not found
11m         Warning   FailedMount   pod/my-namespace-engine-api-deployment-595bf79b79-h27xj   Unable to attach or mount volumes: unmounted volumes=[secrets], unattached volumes=[secrets kube-api-access-f2mf8]: timed out waiting for the condition

The first message makes me think I’ve made some mistake somewhere and I’m still figuring it out but the second is identical to the one stated by op. I’ll be following this thread and giving you updates whether I can solve the issue or if is related to this at all 👍

EDIT: it’s happening on way less pods than OP, I’m pretty sure that mine is an unrelated problem at this point 🤔

EDIT2: Solved! Secret was created in the default namespace instead of the one I needed it in.

We performed the fix from this blog post, which worked for us: https://blog.devgenius.io/when-k8s-pods-are-stuck-mounting-large-volumes-2915e6656cb8

Adding another data point here. From the article, it seems like the solution is applicable for containers running as non-root users. The issue exists even if the container is run as a root user.