amazon-ecs-agent: ECS agent fails to launch with latest AMI/Docker

After rebasing our AMI off the latest ECS optimized AMI to get version 1.11.1 of docker I’m seeing the ECS agent fail to start with this message:

docker: Error response from daemon: rpc error: code = 2 desc = "oci runtime error: rootfs (\"/var/lib/docker/devicemapper/mnt/14f9d36675aabe5b45170f0e2ee9206ed61421959c497a7c468091ad0df7d425/rootfs\") does not exist".

The beginning of my docker log file looks like this:

Mon Jun  6 01:10:48 UTC 2016\n
time="2016-06-06T01:10:48.591674132Z" level=info msg="New containerd process, pid: 2720\n" 
time="2016-06-06T01:10:49Z" level=warning msg="containerd: low RLIMIT_NOFILE changing to max" current=1024 max=4096 
time="2016-06-06T01:10:49.712144800Z" level=info msg="devmapper: Creating filesystem ext4 on device docker-202:1-263764-base" 
\nMon Jun  6 01:11:04 UTC 2016\n
time="2016-06-06T01:11:04.943144528Z" level=info msg="previous instance of containerd still alive (2720)" 
time="2016-06-06T01:11:08.987919664Z" level=fatal msg="Error starting daemon: error initializing graphdriver: Device is Busy" 
\nMon Jun  6 01:11:15 UTC 2016\n
time="2016-06-06T01:11:16.000378921Z" level=info msg="previous instance of containerd still alive (2720)" 
time="2016-06-06T01:11:16.032042379Z" level=info msg="devmapper: Creating filesystem ext4 on device docker-202:1-263764-base" 
time="2016-06-06T01:11:18.418423073Z" level=info msg="devmapper: Successfully created filesystem ext4 on device docker-202:1-263764-base" 
time="2016-06-06T01:11:18.486581118Z" level=info msg="Graph migration to content-addressability took 0.00 seconds" 
time="2016-06-06T01:11:19.323088453Z" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --bip can be used to set a preferred IP address" 
time="2016-06-06T01:11:19.361038386Z" level=warning msg="Your kernel does not support cgroup blkio weight" 
time="2016-06-06T01:11:19.361066723Z" level=warning msg="Your kernel does not support cgroup blkio weight_device" 
time="2016-06-06T01:11:19.361172724Z" level=warning msg="mountpoint for pids not found" 
time="2016-06-06T01:11:19.361957135Z" level=info msg="Loading containers: start." 

time="2016-06-06T01:11:19.362075330Z" level=info msg="Loading containers: done." 
time="2016-06-06T01:11:19.362092940Z" level=info msg="Daemon has completed initialization" 
time="2016-06-06T01:11:19.362129999Z" level=info msg="Docker daemon" commit="5604cbe/1.11.1" graphdriver=devicemapper version=1.11.1 
time="2016-06-06T01:11:19.375313966Z" level=info msg="API listen on /var/run/docker.sock" 
time="2016-06-06T01:11:50Z" level=error msg="containerd: start container" error="oci runtime error: rootfs (\"/var/lib/docker/devicemapper/mnt/bc3d9e8c25aff497e5c69c0951607a7527399a80e289ba477aa1ba9248520914/rootfs\") does not exist" id=ca65f0918a43843fc84a130381efc347da2602fa9a0273402e5de2edf78efd4a 

No doubt some of my scripting to do with docker startup is no longer playing nicely with the way ECS docker expects the storage to be configured, any suggestions what it might be?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 20 (9 by maintainers)

Most upvoted comments

We’ve also been encountering this issue, thankfully in a non-production environment. I got the docker daemon running again by manually restarting the instance, and that aloud our cluster to connect & run the needed container.

The instance is spun up by an AutoScalingGroup. Here’s the following output as requested, and below is the user-data for the launch configuration. amazon-ecs-docker-log-errors.txt

user-data:

#!/bin/bash

echo ECS_CLUSTER=[cluster_name] > /etc/ecs/ecs.config

yum install -y docker service docker start usermod -a -G docker ec2-user

Hope this helps!

@alexmac My apologies for the delay in response; I got pretty busy last week and this week with DockerCon.

So, a bit of background on what is happening and how:

When we build the AMI, we include a BlockDeviceMapping for an empty EBS volume. At boot, upstart on the instance starts running various software, including cloud-init. Among other things like setting up SSH using the public key you specified when launching the instance, cloud-init is used to configure the instance on boot. The ECS-optimized AMI specifies some cloud-config configuration in a file located at /etc/cloud/cloud.cfg.d/90_ecs.cfg and tells cloud-init to invoke docker-storage-setup through the cloud-init-per helper as a bootcmd. The cloud-config configuration is read very early in the boot process, prior to Docker being started, and bootcmds in particular are executed early in the boot process (this is different from normal user-data scripts, which are executed toward the end). We picked a bootcmd as it was a good way for us to ensure that docker-storage-setup ran before Docker was started the very first time.

I haven’t used Packer before, but there are a few different general techniques you might be able to apply. For example:

  • Option A:
    1. Inject your own cloud-config configuration when the source instance is launched that overrides the bootcmd
    2. Run whatever scripts you need to prepare the instance normally
    3. Clean up Docker (stop Docker, remove /var/lib/docker, and remove /etc/sysconfig/docker-storage)
    4. Shut down the instance and snapshot the root volume
    5. Register an AMI with the root volume snapshot and a BlockDeviceMapping for the second volume (as /dev/xvdcz) without a snapshot
    6. This should give you roughly the same experience as launching an ECS-optimized AMI in that docker-storage-setup should run and set up the second volume as the LVM thin pool
  • Option B:
    1. Launch an instance of the normal Amazon Linux AMI
    2. Create a volume from the public snapshot of the root volume of the ECS-optimized AMI and attach it to your instance
    3. Perform whatever modifications you want on that volume, then detach and snapshot
    4. Register an AMI with the that snapshot and a BlockDeviceMapping for the second volume (as /dev/xvdcz) without a snapshot
    5. Again, this should give you roughly the same experience
  • Option C: Use your existing process, but specify the size of /dev/xvdcz explicitly at launch through the BlockDeviceMapping parameter of RunInstances and use docker ps to wait for initialization to finish prior to stopping Docker.

I haven’t tested each of these, but hopefully this helps give you some general ideas of how you can approach it.