amazon-ecs-agent: Agent is Unable to kill container when Task Definition Memory Is reached

I am running a container and when the hard task memory limit is reached it is not killed. In addition to not dying it begins to do a large amount of docker.io.read_bytes (observed from ECS datadog integration).

$ uname -r
4.4.51-40.58.amzn1.x86_64
$ sudo docker -v
Docker version 1.12.6, build 7392c3b/1.12.6

Agent version 1.14.1

Stats shows that the container Id frequently reaches 100% memory complete, and shows BLOCK I/O perpetually increasing (the application should only be using BLOCK I/O to read a configuration file during startup)

$ sudo docker stats <container_id>
CONTAINER           CPU %               MEM USAGE / LIMIT     MEM %               NET I/O             BLOCK I/O           PIDS
43a240399e21        3.82%               99.93 MiB / 100 MiB   99.93%              0 B / 0 B           45.72 GB / 0 B      0

The container remains up:

CONTAINER ID        IMAGE                                                                           COMMAND              CREATED             STATUS              PORTS               NAMES
43a240399e21        420876366622.dkr.ecr.us-east-1.amazonaws.com/events-writer:latest   "events-writer.py"   13 minutes ago      Up 13 minutes                           ecs-production-events-writer-57-s3eventswriterworker-9ceef8a99883ccd77d00
$ sudo docker inspect 43a240399e21
[
    {
        "Id": "43a240399e21012586a0f25207c20b0718590a8f267d4d3425095e84648ac853",
        "Created": "2017-05-09T19:38:28.950062224Z",
        "Path": "events-writer.py",
        "Args": [],
        "State": {
            "Status": "running",
            "Running": true,
            "Paused": false,
            "Restarting": false,
            "OOMKilled": false,
            "Dead": false,
            "Pid": 25144,
            "ExitCode": 0,
            "Error": "",
            "StartedAt": "2017-05-09T19:38:29.592819604Z",
            "FinishedAt": "0001-01-01T00:00:00Z"
        },
        "Image": "sha256:8bbab436d3e9458aa48c9f65e1c99832033ef3d6dc9c41a728962fd7a40ab553",
        "ResolvConfPath": "/var/lib/docker/containers/43a240399e21012586a0f25207c20b0718590a8f267d4d3425095e84648ac853/resolv.conf",
        "HostnamePath": "/var/lib/docker/containers/43a240399e21012586a0f25207c20b0718590a8f267d4d3425095e84648ac853/hostname",
        "HostsPath": "/var/lib/docker/containers/43a240399e21012586a0f25207c20b0718590a8f267d4d3425095e84648ac853/hosts",
        "LogPath": "",
        "Name": "/ecs-production-events-writer-57-s3eventswriterworker-9ceef8a99883ccd77d00",
        "RestartCount": 0,
        "Driver": "devicemapper",
        "MountLabel": "",
        "ProcessLabel": "",
        "AppArmorProfile": "",
        "ExecIDs": null,
          "HostConfig": {
            "Binds": [
                "/etc/xxx:/etc/xxx",
                "/raid0/workers/log:/raid0/workers/log"
            ],
            "ContainerIDFile": "",
            "LogConfig": {
                "Type": "awslogs",
                "Config": {
                    "awslogs-group": "ecs-production-events-writer-xxx",
                    "awslogs-region": "us-east-1",
                    "awslogs-stream": "s3_events_writer/s3_events_writer_worker/ff36c369-f4e7-46b3-b8af-89a3823dcc37"
                }
            },
            "NetworkMode": "host",
            "PortBindings": null,
            "RestartPolicy": {
                "Name": "",
                "MaximumRetryCount": 0
            },
            "AutoRemove": false,
            "VolumeDriver": "",
            "VolumesFrom": null,
            "CapAdd": null,
            "CapDrop": null,
            "Dns": null,
            "DnsOptions": null,
            "DnsSearch": null,
            "ExtraHosts": null,
            "GroupAdd": null,
            "IpcMode": "",
            "Cgroup": "",
            "Links": null,
            "OomScoreAdj": 0,
            "PidMode": "",
            "Privileged": true,
            "PublishAllPorts": false,
            "ReadonlyRootfs": false,
            "SecurityOpt": [
                "label=disable"
            ],
            "UTSMode": "",
            "UsernsMode": "",
            "ShmSize": 67108864,
            "Runtime": "runc",
            "ConsoleSize": [
                             0,
                0
            ],
            "Isolation": "",
            "CpuShares": 256,
            "Memory": 104857600,
            "CgroupParent": "",
            "BlkioWeight": 0,
            "BlkioWeightDevice": null,
            "BlkioDeviceReadBps": null,
            "BlkioDeviceWriteBps": null,
            "BlkioDeviceReadIOps": null,
            "BlkioDeviceWriteIOps": null,
            "CpuPeriod": 0,
            "CpuQuota": 0,
            "CpusetCpus": "",
            "CpusetMems": "",
            "Devices": null,
            "DiskQuota": 0,
            "KernelMemory": 0,
            "MemoryReservation": 0,
            "MemorySwap": 209715200,
            "MemorySwappiness": -1,
            "OomKillDisable": false,
            "PidsLimit": 0,
            "Ulimits": null,
            "CpuCount": 0,
            "CpuPercent": 0,
            "IOMaximumIOps": 0,
            "IOMaximumBandwidth": 0
        },
        "GraphDriver": {
            "Name": "devicemapper",
            "Data": {
                "DeviceId": "125",
                "DeviceName": "docker-202:1-263237-7e419402b4a47f4da21314bf3ae14aff3a4f95b26b2932c3099089474329eed9",
                "DeviceSize": "10737418240"
            }
        },
        "Mounts": [
            {
                "Source": "/etc/xxx",
                "Destination": "/etc/xxx",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            },
            {
                "Source": "/raid0/workers/log",
                "Destination": "/raid0/workers/log",
                "Mode": "",
                "RW": true,
                "Propagation": "rprivate"
            }
        ],
        "Config": {
            "Hostname": "ip-10-0-116-202",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "XXX=XXX",
                "MAX_BUFFER_EVENTS_FOR_FLUSH=7000",
                "XXX=8000",
                "XXX=35",
                "EVENTS_CHANNEL=s3_events_writer_xxx",
                "MAX_EVENTS_PER_FILE=500",
                "S3_BUCKET=XXX",
                "XXX=1",
                "XXX=1",
                "PATH=/usr/local/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin",
                "LANG=C.UTF-8",
                "GPG_KEY=XXX",
                "PYTHON_VERSION=2.7.13",
                "XXX=9.0.1",
                "XXX=1"
            ],
            "Cmd": [
                "events-writer.py"
            ],
            "Image": "420876366622.dkr.ecr.us-east-1.amazonaws.com/events-writer:latest",
            "Volumes": null,
            "WorkingDir": "/usr/src/app",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": {
                "com.amazonaws.ecs.cluster": "production",
                "com.amazonaws.ecs.container-name": "s3_events_writer_worker",
                "com.amazonaws.ecs.task-arn": "arn:aws:ecs:us-east-1:420876366622:task/ff36c369-f4e7-46b3-b8af-89a3823dcc37",
                "com.amazonaws.ecs.task-definition-family": "production-events-writer",
                "com.amazonaws.ecs.task-definition-version": "57"
            }
        },
        "NetworkSettings": {
            "Bridge": "",
            "SandboxID": "f4dbb092647e221c6ebf8e5cb5f6dd2ad5a0152919738ecf5bad7f84af84de2e",
            "HairpinMode": false,
            "LinkLocalIPv6Address": "",
            "LinkLocalIPv6PrefixLen": 0,
            "Ports": {},
            "SandboxKey": "/var/run/docker/netns/default",
            "SecondaryIPAddresses": null,
            "SecondaryIPv6Addresses": null,
            "EndpointID": "",
            "Gateway": "",
            "GlobalIPv6Address": "",
            "GlobalIPv6PrefixLen": 0,
            "IPAddress": "",
            "IPPrefixLen": 0,
            "IPv6Gateway": "",
            "MacAddress": "",
            "Networks": {
                "host": {
                    "IPAMConfig": null,
                    "Links": null,
                    "Aliases": null,
                    "NetworkID": "1a8ec44a83a5c735e9c8c25b017874ed8041249bc8ed525ac6f68a9097abf47d",
                    "EndpointID": "0f3888269e99921cdfb848f5aaecbb77a4d6cbd9d4dac7b2c5685d40230b888e",
                    "Gateway": "",
                    "IPAddress": "",
                    "IPPrefixLen": 0,
                    "IPv6Gateway": "",
                    "GlobalIPv6Address": "",
                    "GlobalIPv6PrefixLen": 0,
                    "MacAddress": ""
                }
            }
        }
    }
]

Sometimes the agent IS able to kill the container after 10-20 minutes:

43a240399e21        420876366622.dkr.ecr.us-east-1.amazonaws.com/events-writer:latest    "events-writer.py"   19 minutes ago      Exited (137) About a minute ago                       ecs-production-events-writer-57-s3eventswriterworker-9ceef8a99883ccd77d00

Also once the container is in a 100% state, if I try to exec -it <container_id> /bin/bash it will hang for a while and then register the SIGKILL, almost like it finally recognizes SIGKILL only after I exec.

The daemonization feature, and auto restart is critical to keeping resource depletion failures from taking down other services and would really appreciate any insight possible.

Thank you

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 4
  • Comments: 22 (5 by maintainers)

Most upvoted comments

amzn-ami-2017.09.i-amazon-ecs-optimized is still affected by the issue. Is there any plan to provide a kernel compatible with docker for the “ecs-optimized” ami ? The Ubuntu “solution” and the reaper-cron “solution” do not feel really sound.

We are also having the same problem on Amazon Linux AMI 2017.09. A container uses up all it’s available memory and starts thrashing reads. Container is pretty much unavailable until its eventually killed off.

Besides the reaper cron, has anyone found a reasonable solution?

correct @jhaynes, as I said major page faults are absolutely the problem and increasing the memory limit will resolve it until you bump up against the limit again.

as we’re striving for container isolation and protecting the health of the host, we chose to write a simple reaper that runs on every ECS instance and stops containers that have crossed a major page fault threshold we chose based on our environment (happy containers might cause 300/day, and sad containers can rack up hundreds of thousands within a few minutes). running it every minute using cron has been effective: these containers are now killed off within 60 seconds of them starting to thrash the disk, and the host recovers without intervention. ECS reschedules the container if necessary, and we notify the responsible engineer so they can investigate later. 👌

Our script looks something like this:

#!/bin/sh

# don't kill containers using these images even if they're misbehaving
EXCLUDES_PATTERN=$(cat <<'EOF' | xargs | sed 's/ /|/g'
amazon/amazon-ecs-agent
EOF
)

# list all the candidate containers
targets=$(docker ps --no-trunc --format '{{.ID}} {{.Image}}' | grep -Ev "$EXCLUDES_PATTERN" | awk '{ print $1; }' | xargs)

for target in $targets; do
  cd "/cgroup/memory/docker/$target" || exit
  info="id=$target $(docker inspect --format 'image={{.Config.Image}} StartedAt="{{.State.StartedAt}}"' "$target") pgmajfault=$(grep total_pgmajfault memory.stat | awk '{print $2;}')"
  value=$(echo "$info" | awk '{ print $4;}' | sed 's/pgmajfault=//g')

  if [ "$value" -gt 10000 ]; then
    echo "Executing docker stop on container due to $value major page faults ($info)"
    docker stop "$target" &
  fi

  cd - || exit
done

wait

HTH!

We’re debugging the exact same issue here, but I believe the issue lies with the kernel and not the ECS agent or even docker. (the oomkiller lives in the kernel)

Very basic containers (a few different nodejs-based apps, one collectd) container, reaches its memory limit, observed to sit between 99.9 and 100% of the limit, starts chewing through IO read on the docker volume, which eventually exhausts our burst balance, at which point the host (and other workloads) become pretty unhappy. The container may or may not be eventually killed OOM, but not as soon as one would expect.

In one case I directly observed, docker stats reported the container in question flapping between 99 and 100% usage, but it was only killed OOM after almost an hour in that state. syslogs confirm the kernel didn’t consider killing it until then.

A few things that seem relevant to note:

  • we manually adjust the value of /cgroup/memory/docker/<container>/memory.memsw.limit_in_bytes to match /cgroup/memory/docker/<container>/memory.limit_in_bytes, to attempt to disable swap usage for all containers (where by default it is 2X the memory limit)
  • we have swap enabled on the host (with a 5GB EBS volume), but the spike in IO reads we see are on the docker devicemapper volume, not the swap volume
  • host sets vm.swappiness=0
  • Amazon ECS-Optimized Amazon Linux AMI 2016.09.g

@dm03514 if we figure it out I’ll make sure you hear about it, would appreciate the same!

@swettk and I spent more time with this today, and have plausible theory: as the container approaches its memory limit, it causes major page faults. This could be why we see high reads but almost no writes, and they are reads from the docker disk, not the swap disk. It eventually crosses the actual cgroup memory limit and will then get killed OOM, but while it hangs out at that boundary you may end up thrashing your disk.

we’re personally planning to investigate:

  • using cgroups to limit disk IO (as this cluster is not intended for disk-heavy loads)
  • writing an agent to police containers and kill ones with high major page faults and/or disk IO, just as the oomkiller does with memory
  • increasing our IOPS budget (which currently stands at a pretty low 150)

Thank you, i’ll check first thing tomorrow morning (EST)