mage-ai: Too many open files even with large ulimits
Describe the bug We are trying to deploy Mage in production using an ECS cluster setup as recommended for AWS (using Terraform). But after a few minutes of uptime the Dashboard becomes inaccessible. Initially the thoughts were some networking/connectivity issue, but after a look at the Cloudwatch logs we have isolated it to a ulimit issue.
OSError: [Errno 24] Too many open files
and zmq.error.ZMQError: Too many open files
We have increased the size of the task and the ulimits but still no luck. Our current settings are 512 CPU and 4GB Memory with the nofile
ulimits defined as below in the Task Definition:
"ulimits": [
{
"name": "nofile",
"softLimit": 32768,
"hardLimit": 65536
}
]
There are no pipelines created, this is a fresh copy of Mage, with just the example_pipeline
in it. Mage version is 0.9.8
Screenshots
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 29 (12 by maintainers)
@wangxiaoyou1993 can confirm that after a day of letting it run, it is working properly. Thanks so much for the effort on your part to getting this tricky bug fixed!
@dumim The fix is included in
mageai/mageai:latest
. You can test it.