amazon-ecs-agent: Unable to use task networking on non Amazon Linux ECS optimized AMIs

Summary

I’m trying to investigate using the recently released task networking for our ECS clusters but I’m not running Amazon Linux on our ECS clusters (standardised with Ubuntu across everything instead).

This seems to not work by design although the docs weren’t especially clear here, suggesting at first that you just needed ECS agent 1.15 for this to work.

Description

On Ubuntu 16.04 host running Docker 17.09.0-ce and ECS agent 1.15.1 running an awsvpc networking task originally threw the following error in the ECS console when scheduling a task:

service test-cni was unable to place a task because no container instance met all of its requirements. The closest matching container-instance 9d04bbf1-b1e9-499b-9f5f-ebb7a1af0c92 is missing an attribute required by your task.

After reading the docs a little but more and seeing that there is also a requirement of the ecs-init package and talking to support to confirm this requirement I took a look at what the ecs-init package is doing and found it bind mounts a few extra volumes, adds some linux capabilites and sets ECS_ENABLE_TASK_ENI=true in the ECS config.

I changed my systemd unit file to:

[Unit]
Description=ECS Agent
Requires=docker.service
After=docker.service cloud-final.service

[Service]
Restart=always
ExecStart=/usr/bin/docker run --name ecs-agent \
  --privileged \
  --restart=on-failure:10 \
  --volume=/var/run:/var/run \
  --volume=/var/log/ecs/:/log \
  --volume=/var/lib/ecs/data:/data \
  --volume=/etc/ecs:/etc/ecs \
  --volume=/proc:/host/proc:ro \
  --volume=/var/lib/ecs/dhclient:/var/lib/ecs/dhclient \
  --volume=/lib64:/lib64:ro \
  --volume=/sbin:/sbin:ro \
  --cap-add=NET_ADMIN \
  --cap-add=SYS_ADMIN \
  --net=host \
  --env-file=/etc/ecs/ecs.config \
  amazon/amazon-ecs-agent:latest
ExecStop=/usr/bin/docker rm -f ecs-agent

[Install]
WantedBy=default.target

and added the ECS_ENABLE_TASK_ENI=true to the ECS config but the ECS agent Docker image then panics with [CRITICAL] Unable to initialize Task ENI dependencies: agent is not started with an init system.

Looking at the source shows an explanation for why it throws: https://github.com/aws/amazon-ecs-agent/blob/c5c0f37ddabf848beb8ad25f0f5f5ffd5bb39740/agent/app/agent_unix.go#L50-L57

Is there a good way to get this to work without needing the ecs-init package? Or do I need to wait for the ecs-init repo to add systemd unit files and enable support for Suse/Ubuntu on task networking (it’s currently not built for those distros)?

Task based networking would be a really nice addition but right now, with it restricted to just Amazon Linux (and having to manage these instances) it’s not workable for us. Expanding it to cover Ubuntu (and also not just the ancient LTS using upstart) or not having to manage the instances at all (a la GKE/AKE) would be great.

Environment Details

Ubuntu 16.04 host running Docker 17.09.0-ce and ECS agent 1.15.1

Supporting Log Snippets

service test-cni was unable to place a task because no container instance met all of its requirements. The closest matching container-instance 9d04bbf1-b1e9-499b-9f5f-ebb7a1af0c92 is missing an attribute required by your task.
[CRITICAL] Unable to initialize Task ENI dependencies: agent is not started with an init system

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 7
  • Comments: 21 (7 by maintainers)

Most upvoted comments

Hi @BrunoCarrier @tomelliff, attaching ENIs to instances is definitely supported on all instance types launched into a VPC, irrespective of the instance type. To configure an ENI for containers/tasks, ECS agent depends on tools such as dhclient and on some container capabilities such as the --init flag, SYSTEM_ADMIN and NET_ADMIN capabilities that are provided via Docker. ECS init is a convenient way for the ECS agent to be bootstrapped with all of these so that tasks that require ENIs do not fail during initialization because of missing dependencies/configurations.

When we released this feature last year, we added support in ECS init for doing this for Amazon Linux distribution. The work needed to do this so that we enable this support for other distro’s is on our roadmap.

Having said that, I just started ECS agent using @nmeyerhans’s unit file and was able to successfully start a task in ‘awsvpc’ mode. I’m pasting the command for reference as well:

  1. Install Docker by following instructions here: https://docs.docker.com/install/linux/docker-ce/ubuntu/#install-docker-ce
  2. Start ECS agent using the command:
$ cat /etc/ecs/ecs.config
ECS_CLUSTER=ubuntu-task-eni

$ docker run --name ecs-agent \
  --init \
  --restart=on-failure:10 \
  --volume=/var/run:/var/run \
  --volume=/var/log/ecs/:/log \
  --volume=/var/lib/ecs/data:/data \
  --volume=/etc/ecs:/etc/ecs \
  --volume=/sbin:/sbin \
  --volume=/lib:/lib \
  --volume=/lib64:/lib64 \
  --volume=/usr/lib:/usr/lib \
  --volume=/proc:/host/proc \
  --volume=/sys/fs/cgroup:/sys/fs/cgroup \
  --volume=/var/lib/ecs/dhclient:/var/lib/dhclient \
  --net=host \
  --env ECS_LOGFILE=/log/ecs-agent.log \
  --env ECS_DATADIR=/data \
  --env ECS_UPDATES_ENABLED=false \
  --env ECS_AVAILABLE_LOGGING_DRIVERS='["json-file","syslog","awslogs"]' \
  --env ECS_ENABLE_TASK_IAM_ROLE=true \
  --env ECS_ENABLE_TASK_IAM_ROLE_NETWORK_HOST=true \
  --env ECS_UPDATES_ENABLED=true \
  --env ECS_ENABLE_TASK_ENI=true \
  --env-file=/etc/ecs/ecs.config \
  --cap-add=sys_admin \
  --cap-add=net_admin \
  -d \
  amazon/amazon-ecs-agent:latest
  1. Start an ‘awsvpc’ task using a sample task definition

I have created an issue in the ECS init repo for the same as well: https://github.com/aws/amazon-ecs-init/issues/150

I’ll just lave a note here in case some finds it useful.

I was able to get awsvpc networking working on CoreOS Linux, however, it does involve some hacks. ecs-agent requires dhclient binary, which is not available in the ecs-agent container image, so the host dhclient gets mounted using docker volume. This works fine on linux distributions that ship with dhclient, but CoreOS is not one of them (at least I could not find it). The solution to this is to build a custom ecs-agent container based on top of alpine image and installing the dhclient using apk. Further, dockers multi stage build can be used to COPY binaries from official ecs-agent image into customized (Dockerfile example below). Afterwards, custom ecs-agent image can be used in systemd unit.

Perhaps I’m missing something important here, if so please let me know.

# Dockerfile
FROM amazon/amazon-ecs-agent:latest as aws-ecs-agent

FROM alpine:latest

RUN apk add --update --no-cache dhclient

COPY --from=aws-ecs-agent /agent /agent
COPY --from=aws-ecs-agent /images/amazon-ecs-pause.tar /images/amazon-ecs-pause.tar
COPY --from=aws-ecs-agent /amazon-ecs-cni-plugins /amazon-ecs-cni-plugins
COPY --from=aws-ecs-agent /etc/ssl/certs/ca-certificates.crt /etc/ssl/certs/ca-certificates.crt

EXPOSE 51678 51679

ENTRYPOINT ["/agent"]