amazon-ssm-agent: Failed to create channel: too many open files
We had been using SSM tunnel SSH login for a week or so, but all of a sudden our servers are receiving the following error
Region: us-east-1 Ubuntu 14.04.1 LTS Instance type: c4.large amazon-ssm-agent version: 2.3.930.0
/var/log/amazon/ssm/amazon-ssm-agent.log
2020-03-27 03:05:10 ERROR [ssm-session-worker] [xxx-048848f4f625b03a0] filewatcher listener encountered error when start watcher: too many open files
2020-03-27 03:05:10 ERROR [ssm-session-worker] [xxx-048848f4f625b03a0] failed to create channel: too many open files
The server we are connecting doesn’t have many connections running, and
- ulimits -S & ulimits -H are both ulimited
- cat /etc/security/limits.conf
* soft nofile 500000
* hard nofile 500000
ubuntu soft nofile 500000
ubuntu hard nofile 500000
root soft nofile 500000
root hard nofile 500000
- cat /etc/sysctl.conf
fs.file-max = 500000
- service amazon-ssm-agent restart, service restarted correctly, but still cannot connect
- We have lots of lingering sessions/channels folder/files in
/var/lib/amazon/ssm/i-xxxxx/session
/var/lib/amazon/ssm/i-xxxxx/channels
/var/lib/amazon/ssm/i-xxxxx/documents
- regardless of the ulimit we set in our system ulimit of the process seems to be stuck at 1024
[0] ✓ root@ip-10-0-0-x:/ [01:26:28]
---> start amazon-ssm-agent
amazon-ssm-agent start/running, process 22668
[0] ✓ root@ip-10-0-0-x:/ [01:26:35]
---> cat /proc/22668/limits
Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 30038 30038 processes
Max open files 1024 4096 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 30038 30038 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us
- restart/stop amazon-ssm-agent doesn’t help, files are not deleted, same error after restsart
- there is no file/inode issue with our system
[0] ✓ root@ip-10-0-0-x:/var/lib/amazon/ssm/i-xxx [01:53:41]
---> df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 50G 29G 19G 62% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 1.9G 8.0K 1.9G 1% /dev
tmpfs 377M 388K 377M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 1.9G 0 1.9G 0% /run/shm
none 100M 0 100M 0% /run/user
[0] ✓ root@ip-10-0-0-x:/var/lib/amazon/ssm/i-xxx [01:53:50]
---> df -ih
Filesystem Inodes IUsed IFree IUse% Mounted on
/dev/xvda1 3.2M 514K 2.7M 17% /
none 471K 2 471K 1% /sys/fs/cgroup
udev 470K 386 469K 1% /dev
tmpfs 471K 324 471K 1% /run
none 471K 1 471K 1% /run/lock
none 471K 1 471K 1% /run/shm
none 471K 2 471K 1% /run/user
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 20 (6 by maintainers)
“AssociationLogsRetentionDurationHours” : 24, “RunCommandLogsRetentionDurationHours” : 336, “SessionLogsRetentionDurationHours” : 336
We keep the orchestration files for sometime before removing them, configurable in amazon-ssm-agent.json file