aws-ebs-csi-driver: Max number of volumes calculation is incorrect
/kind bug
What happened? Unable to start a pod which uses a volume:
Warning FailedAttachVolume 22m (x3 over 70m) attachdetach-controller AttachVolume.Attach failed for volume “pvc-???” : timed out waiting for the condition Warning FailedMount 119s (x41 over 92m) kubelet, ip-???.ap-southeast-2.compute.internal Unable to mount volumes for pod “mysql-0_example(???)”: timeout expired waiting for volumes to attach or mount for pod “example”/“mysql-0”. list of unmounted volumes=[mysql-data]. list of unattached volumes=[mysql-data default-token-???] 0
There are over 20 volumes mounted on the instance, no more can be mounted.
What you expected to happen?
When mounting volumes on nodes which use multiple ENIs the max limit is calculated incorrectly as ENIs use some of the resources (it is very well described in this ticket https://github.com/kubernetes/kubernetes/issues/80967).
Driver should check how many ENIs are in use and decrease the number of volumes. Other option is allow external param to limit number of volumes by admin.
Environment
- Kubernetes version (use
kubectl version):
Client Version: version.Info{Major:“1”, Minor:“16”, GitVersion:“v1.16.2”, GitCommit:“c97fe5036ef3df2967d086711e6c0c405941e14b”, GitTreeState:“clean”, BuildDate:“2019-10-15T23:42:50Z”, GoVersion:“go1.12.10”, Compiler:“gc”, Platform:“darwin/amd64”} Server Version: version.Info{Major:“1”, Minor:“14+”, GitVersion:“v1.14.8-eks-b7174d”, GitCommit:“b7174db5ee0e30c94a0b9899c20ac980c0850fc8”, GitTreeState:“clean”, BuildDate:“2019-10-18T17:56:01Z”, GoVersion:“go1.12.10”, Compiler:“gc”, Platform:“linux/amd64”}
- Driver version:
I’m not sure what specific driver version it is. The cluster was created using
terraform-aws-eks v6.0.2module (https://github.com/terraform-aws-modules/terraform-aws-eks?ref=v6.0.2)
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 7
- Comments: 26 (9 by maintainers)
This issue forces cluster admins to set a static low value for volume-attach-limit to avoid running into attachment issues on nodes using AWS VPC CNI. Which leads to premature Node scaling-out and inefficient use of resources. It would be great if the driver became aware of the current attachments (ENI & EBS) on its EC2 instance and calculate the max number accordingly.