amazon-eks-ami: Raise docker default ulimit for nofile to 65535
In the latest AMI version, v20190327, in the file /etc/sysconfig/docker the file ulimit is set to 4096:
OPTIONS="--default-ulimit nofile=1024:4096"
We’ve already hit this limit with some java applications and have raised the limit to 65535 in user-data:
sed -i 's/^OPTIONS=.*/OPTIONS=\\\"--default-ulimit nofile=65535:65535\\\"/' /etc/sysconfig/docker && systemctl restart docker
Question: Isn’t 4096 a little conservative for an EKS node? Is there anything wrong with just setting this to 65535 by default in the AMI?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 7
- Comments: 18 (7 by maintainers)
@max-rocket-internet it is supposed to be 65535, and was originally but there was a series of PR mishaps such that several images had changes added that reduced that to 4096 or 8192.
The fun started in #186 where someone thought the setting was lower and added a PR to ‘raise’ it to 8192. This actually reduced it from 65535 to 8192, which immediately caused problems (#193). People tried to revert that change in #206 but that didn’t work. Meanwhile a fix in #205 got closed in favor of #206. But #206 didn’t work because the latest commits weren’t being included in the AMI builds. So fresh builds in #233 tried to restore the #206 reversion of #186, while the ongoing issue was tracked in #234.
In theory the current latest AMIs should be back to 65535. Any fixed versions were dated 31 March or later, as the problem still wasn’t fixed on 29 March. And even after that I heard GPU AMI’s still had the issue. https://github.com/awslabs/amazon-eks-ami/issues/233#issuecomment-478392268
@max-rocket-internet sorry for reviving this discussion, but I am confused:
From reading the discussion history here those 2 different values were also mixed up I think. As far as I understand, the host limits are in the last instance the ones that count, even if the docker process defines higher limits.
So this is not yet fixed, is it?
We have been running into this error on EKS nodes:
The fix for us was to actually apply these changes in our userdata scripts where we bootstrap our EKS nodes in Terraform:
This overrides the
99-amazon.confand, after applying, resolved our issue immediately. I think this needs to be fixed in the AMI as well.Hehe, thanks @whereisaaron for the comprehensive history write up of this issue! 👍
I’m hitting this too trying to deploy tikv.
ulimit -nin a pod in EKS reporting as65536however tikv wont start, saying it expects >=82920For comparison, local kind cluster AND azure k8s cluster have it set to1048576Looks resolve to me:
Let’s hope this issue doesn’t come back again 🙏
Is this still an issue? The latest AMI version, currently
v20190614, doesn’t have any additional ulimit configuration. TheOPTIONSline is no longer in/etc/sysconfig/dockerand I see no other ulimit tweaks in this repo.The default systemd unit file for dockerd in
/usr/lib/systemd/system/docker.service(elided for brevity) sets the defaults toinfinity:Validated by looking at
/proc/$(pidof dockerd)/limits:I plan to resolve this issue around July 10 if there is no confirmation that this is still an issue.
@echoboomer the inotify limit seems like it’s independent of this issue. Would you mind opening that in a new issue so we can track a fix for it outside of this one? Thanks.