amazon-ecs-agent: Too many open files

Yesterday we upgraded our cluster from amzn-ami-2016.03.c-amazon-ecs-optimized to the latest, amzn-ami-2016.03.g-amazon-ecs-optimized. At some point overnight, two of the instances in our cluster (out of ~6 in ASG) began flooding logs of this nature (hundreds per second):

Aug 17 07:21:17 Seelog error: open /log/ecs-agent.log.2016-08-17-14: too many open files
Aug 17 07:21:17 2016-08-17T14:21:17Z [WARN] Error retrieving stats for container bcbd3d6d2a51f656ec2066e62296010f5432262e1a564678325a60f3e642a575: dial unix /var/run/docker.sock: socket: too many open files

The two instances terminated without human interaction (not sure if that’s a coincidence of auto-scaling). Near the end, these logs also appeared:

Aug 17 07:21:17 2016-08-17T14:21:17Z [CRITICAL] Error saving state before final shutdown module="TerminationHandler" err="Multiple error:
Aug 17 07:21:17   0: Timed out waiting for TaskEngine to settle
Aug 17 07:21:17   1: Timed out trying to save to disk"

We haven’t experienced this on previous AMIs.

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 4
  • Comments: 20 (6 by maintainers)

Most upvoted comments

@ziggythehamster The new ECS AMI is amzn-ami-2016.03.h-amazon-ecs-optimized. We’ll be updating our documentation shortly.

Any chance of an ETA on 1.12.1?

We’ve just released 1.12.1, which should fix this issue. Please let us know if you continue to run into problems.