portainer: Too many open files
Bug description
I am running a docker swarm with 7 nodes 3 managers and 4 workers. I have deployed portainer via the stack file, with agents and portainer it’s self. After some time an agent instance will report high cpu usage 60-70% and the portainer instance will not load any data about services, stacks etc.
When i look at the log files for the agent with the high CPU i get:
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
2018/06/22 13:48:35 [ERR] memberlist: Error accepting TCP connection: accept tcp [::]:7946: accept4: too many open files
Expected behavior I would expect that the portainer instance loads all the data it can, so i can continue to use it as much as possible. An error message explaining that it was unable to contact an agent would also be helpful.
Steps to reproduce the issue: I do not have any specific steps to reproduce it. Just seems to happen randomly.
Technical details:
- Portainer version: 1.18.0
- Docker version (managed by Portainer): 18.03.1-ce
- Platform (windows/linux): Amazon Linux AMI release 2018.03
- Command used to start Portainer (
docker run -p 9000:9000 portainer/portainer
): sudo curl -L https://portainer.io/download/portainer-agent-stack.yml -o portainer-agent-stack.yml sudo docker stack deploy --compose-file=portainer-agent-stack.yml portainer - Browser: Firefox 60.0.2 (64-bit)
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 47 (18 by maintainers)
@deviantony Portainer seems to be running much more stable with the
--no-snaspshot
added. It’s now up for 4 days without any issues!Closing via https://github.com/portainer/agent/issues/43
Will be part of the agent version 1.5.0.
I can confirm the dashboard reloading and fd going up behavior. I created a Vagrant+Ansible+AWS Linux2 image setup to reproduce this. v.zip
vagrant up
there.vagrant provision
vagrant ssh portnode1
to get to the node 1 shell and run:This will list the number of Agent’s process fds every 2 seconds.
Note:
@tle211212
I can’t reproduce this one, I have an agent deployed locally and on a swarm and both file descriptor count stay the same (I tried refreshing the dashboard, changing pages…)
The
--no-snapshot
flag is not a solution here, it was just used to isolate the cause of the problem.I believe that #2235 solves this issue, waiting for some feedback from other users before we decide to merge it.