kubernetes: Unable to attach or mount volumes: timed out waiting for the condition
What happened?
PVC mount gets timed out when the number of parallel pod requests with pvc access goes past 400. This caused pod PVC mount retries(several of these in some cases.) and delays the pod startup.
3m1s Warning FailedMount pod/nginx-deployment-7c54456f-2sk89 Unable to attach or mount volumes: unmounted volumes=[stresstest-pvc kube-api-access-f2nnp], unattached volumes=[stresstest-pvc kube-api-access-f2nnp]: timed out waiting for the condition
What did you expect to happen?
PVC should get mounted on the pods without any failedMount messages.
How can we reproduce it (as minimally and precisely as possible)?
The issue can be easily reproduced on a 10 node cluster with Kubernetes deployment having 800 pod replicas asking to mount the same PVC.
Anything else we need to know?
This issue is related to an old bug 84169 that was closed saying the issue got fixed with k8s 1.17+ but it seems the issue still persists.
Kubernetes version
$ kubectl version
# paste output here
Client Version: version.Info{Major:"1", Minor:"23", GitVersion:"v1.23.1", GitCommit:"86ec240af8cbd1b60bcc4c03c20da9b98005b92e", GitTreeState:"clean", BuildDate:"2021-12-16T11:41:01Z", GoVersion:"go1.17.5", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.8", GitCommit:"7061dbbf75f9f82e8ab21f9be7e8ffcaae8e0d44", GitTreeState:"clean", BuildDate:"2022-03-16T14:04:34Z", GoVersion:"go1.16.15", Compiler:"gc", Platform:"linux/amd64"}
Cloud provider
OS version
# On Linux:
$ cat /etc/os-release
# paste output here
$ uname -a
# paste output here
Ubuntu 20.04
# On Windows:
C:\> wmic os get Caption, Version, BuildNumber, OSArchitecture
# paste output here
Install tools
Container runtime (CRI) and version (if applicable)
Related plugins (CNI, CSI, …) and versions (if applicable)
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 18
- Comments: 45 (3 by maintainers)
We performed the fix from this blog post, which worked for us: https://blog.devgenius.io/when-k8s-pods-are-stuck-mounting-large-volumes-2915e6656cb8
We started to encounter this after upgrade to eks 1.23, using a gp2 storageclass to host Victoria Metrics. Tried @ timvandruenen fix, didn’t work. Also tried to upgrade add-ons to the latest version, same.
Resolved: in our case we lacked an iam policy:
arn:aws:iam::aws:policy/service-role/AmazonEBSCSIDriverPolicy.I found the same error while upscaling nodes for airflow processing 😢
@Jeaniowang Why say so many “报告收到”? This has contaminated my mailing list.
报告收到
Yes, we are running k8s for Airflow too. The problem just slows up Airflow task init so much 😦(
报告收到
报告收到
报告收到
报告收到
报告收到
报告收到
报告收到
报告收到
报告收到
报告收到
I think I have the same issue here in my AKS cluster running Kube v1.21.7 (I know it’s already eol, I still have to schedule an update)
Running
kubectl get events -n my-namespace --sort-by='.metadata.creationTimestamp'I get this output
The first message makes me think I’ve made some mistake somewhere and I’m still figuring it out but the second is identical to the one stated by op. I’ll be following this thread and giving you updates whether I can solve the issue or if is related to this at all 👍
EDIT: it’s happening on way less pods than OP, I’m pretty sure that mine is an unrelated problem at this point 🤔
EDIT2: Solved! Secret was created in the default namespace instead of the one I needed it in.
Adding another data point here. From the article, it seems like the solution is applicable for containers running as
non-rootusers. The issue exists even if the container is run as arootuser.