amazon-ecs-agent: Agent fails to start when installed from user data script
Summary
The ecs agent fails to start when enabled in a user data script.
Description
Per the documentation here, I’m trying to install the ECS Container Agent on an an Amazon Linux 2 EC2 instance. I’m launching Linux 2 with on a t2.micro with all defaults except IAM Role set to ecsInstanceRole and user data set to
#!/bin/bash
mkdir -p /etc/ecs
echo "ECS_CLUSTER=default" > /etc/ecs/ecs.config
amazon-linux-extras disable docker
amazon-linux-extras install -y ecs
systemctl enable --now ecs
Expected Behavior
The ecs agent starts and the instance appears in the default cluster
Observed Behavior
The instance does not appear in the default cluster. SSHing into the instance:
[ec2-user@ip-*** ~]$ systemctl status ecs
● ecs.service - ECS Agent
Loaded: loaded (/usr/lib/systemd/system/ecs.service; enabled; vendor preset: disabled)
Active: inactive (dead)
… and journalctl doesn’t have any log entries for ecs either.
Now if I try to start the ecs agent with sudo systemctl start ecs
, the command will hang indefinitely, but if I stop it with sudo systemctl stop ecs
first and then start again, it will succeed and show up as registered in the default cluster.
Environment Details
Amazon Linux 2 AMI t2.micro
[ec2-user@ip-*** ~]$ curl http://localhost:51678/v1/metadata
curl: (7) Failed to connect to localhost port 51678: Connection refused
Supporting Log Snippets
(relevant error at bottom)
[ec2-user@ip-*** ~]$ cat /var/log/cloud-init-output.log
Cloud-init v. 18.2-72.amzn2.0.6 running 'init-local' at Wed, 28 Nov 2018 20:50:16 +0000. Up 5.01 seconds.
Cloud-init v. 18.2-72.amzn2.0.6 running 'init' at Wed, 28 Nov 2018 20:50:18 +0000. Up 7.37 seconds.
.
.
.
No packages needed for security; 0 packages available
No packages marked for update
Cloud-init v. 18.2-72.amzn2.0.6 running 'modules:final' at Wed, 28 Nov 2018 20:50:25 +0000. Up 14.74 seconds.
Beware that disabling topics is not supported after they are installed.
u'docker' was not enabled. Ignoring.
0 ansible2 available [ =2.4.2 =2.4.6 ]
2 httpd_modules available [ =1.0 ]
3 memcached1.5 available [ =1.5.1 ]
4 nginx1.12 available [ =1.12.2 ]
5 postgresql9.6 available [ =9.6.6 =9.6.8 ]
6 postgresql10 available [ =10 ]
8 redis4.0 available [ =4.0.5 =4.0.10 ]
9 R3.4 available [ =3.4.3 ]
10 rust1 available \
[ =1.22.1 =1.26.0 =1.26.1 =1.27.2 ]
11 vim available [ =8.0 ]
12 golang1.9 available [ =1.9.2 ]
13 ruby2.4 available [ =2.4.2 =2.4.4 ]
15 php7.2 available \
[ =7.2.0 =7.2.4 =7.2.5 =7.2.8 =7.2.11 ]
16 php7.1 available [ =7.1.22 ]
17 lamp-mariadb10.2-php7.2 available \
[ =10.2.10_7.2.0 =10.2.10_7.2.4 =10.2.10_7.2.5
=10.2.10_7.2.8 =10.2.10_7.2.11 ]
18 libreoffice available [ =5.0.6.2_15 =5.3.6.1 ]
19 gimp available [ =2.8.22 ]
20 docker available \
[ =17.12.1 =18.03.1 =18.06.1 ]
21 mate-desktop1.x available [ =1.19.0 =1.20.0 ]
22 GraphicsMagick1.3 available [ =1.3.29 ]
23 tomcat8.5 available [ =8.5.31 =8.5.32 ]
24 epel available [ =7.11 ]
25 testing available [ =1.0 ]
26 ecs available [ =stable ]
27 corretto8 available [ =1.8.0_192 ]
28 firecracker available [ =0.11 ]
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Cleaning repos: amzn2-core amzn2extra-ecs
6 metadata files removed
2 sqlite files removed
0 metadata files removed
Loaded plugins: extras_suggestions, langpacks, priorities, update-motd
Resolving Dependencies
--> Running transaction check
---> Package ecs-init.x86_64 0:1.22.0-4.amzn2 will be installed
--> Processing Dependency: docker >= 17.06.2ce for package: ecs-init-1.22.0-4.amzn2.x86_64
--> Running transaction check
---> Package docker.x86_64 0:18.06.1ce-5.amzn2 will be installed
--> Processing Dependency: pigz for package: docker-18.06.1ce-5.amzn2.x86_64
--> Processing Dependency: libcgroup for package: docker-18.06.1ce-5.amzn2.x86_64
--> Processing Dependency: libltdl.so.7()(64bit) for package: docker-18.06.1ce-5.amzn2.x86_64
--> Running transaction check
---> Package libcgroup.x86_64 0:0.41-15.amzn2 will be installed
---> Package libtool-ltdl.x86_64 0:2.4.2-22.2.amzn2.0.2 will be installed
---> Package pigz.x86_64 0:2.3.4-1.amzn2.0.1 will be installed
--> Finished Dependency Resolution
Dependencies Resolved
================================================================================
Package Arch Version Repository Size
================================================================================
Installing:
ecs-init x86_64 1.22.0-4.amzn2 amzn2extra-ecs 12 M
Installing for dependencies:
docker x86_64 18.06.1ce-5.amzn2 amzn2extra-ecs 37 M
libcgroup x86_64 0.41-15.amzn2 amzn2-core 65 k
libtool-ltdl x86_64 2.4.2-22.2.amzn2.0.2 amzn2-core 49 k
pigz x86_64 2.3.4-1.amzn2.0.1 amzn2-core 81 k
Transaction Summary
================================================================================
Install 1 Package (+4 Dependent packages)
Total download size: 49 M
Installed size: 194 M
Downloading packages:
--------------------------------------------------------------------------------
Total 53 MB/s | 49 MB 00:00
Running transaction check
Running transaction test
Transaction test succeeded
Running transaction
Installing : libtool-ltdl-2.4.2-22.2.amzn2.0.2.x86_64 1/5
Installing : libcgroup-0.41-15.amzn2.x86_64 2/5
Installing : pigz-2.3.4-1.amzn2.0.1.x86_64 3/5
Installing : docker-18.06.1ce-5.amzn2.x86_64 4/5
Installing : ecs-init-1.22.0-4.amzn2.x86_64 5/5
Verifying : pigz-2.3.4-1.amzn2.0.1.x86_64 1/5
Verifying : docker-18.06.1ce-5.amzn2.x86_64 2/5
Verifying : libcgroup-0.41-15.amzn2.x86_64 3/5
Verifying : libtool-ltdl-2.4.2-22.2.amzn2.0.2.x86_64 4/5
Verifying : ecs-init-1.22.0-4.amzn2.x86_64 5/5
Installed:
ecs-init.x86_64 0:1.22.0-4.amzn2
Dependency Installed:
docker.x86_64 0:18.06.1ce-5.amzn2 libcgroup.x86_64 0:0.41-15.amzn2
libtool-ltdl.x86_64 0:2.4.2-22.2.amzn2.0.2 pigz.x86_64 0:2.3.4-1.amzn2.0.1
Complete!
Installing ecs-init
0 ansible2 available [ =2.4.2 =2.4.6 ]
2 httpd_modules available [ =1.0 ]
3 memcached1.5 available [ =1.5.1 ]
4 nginx1.12 available [ =1.12.2 ]
5 postgresql9.6 available [ =9.6.6 =9.6.8 ]
6 postgresql10 available [ =10 ]
8 redis4.0 available [ =4.0.5 =4.0.10 ]
9 R3.4 available [ =3.4.3 ]
10 rust1 available \
[ =1.22.1 =1.26.0 =1.26.1 =1.27.2 ]
11 vim available [ =8.0 ]
12 golang1.9 available [ =1.9.2 ]
13 ruby2.4 available [ =2.4.2 =2.4.4 ]
15 php7.2 available \
[ =7.2.0 =7.2.4 =7.2.5 =7.2.8 =7.2.11 ]
16 php7.1 available [ =7.1.22 ]
17 lamp-mariadb10.2-php7.2 available \
[ =10.2.10_7.2.0 =10.2.10_7.2.4 =10.2.10_7.2.5
=10.2.10_7.2.8 =10.2.10_7.2.11 ]
18 libreoffice available [ =5.0.6.2_15 =5.3.6.1 ]
19 gimp available [ =2.8.22 ]
20 docker available \
[ =17.12.1 =18.03.1 =18.06.1 ]
21 mate-desktop1.x available [ =1.19.0 =1.20.0 ]
22 GraphicsMagick1.3 available [ =1.3.29 ]
23 tomcat8.5 available [ =8.5.31 =8.5.32 ]
24 epel available [ =7.11 ]
25 testing available [ =1.0 ]
26 ecs=latest enabled [ =stable ]
27 corretto8 available [ =1.8.0_192 ]
28 firecracker available [ =0.11 ]
Created symlink from /etc/systemd/system/multi-user.target.wants/ecs.service to /usr/lib/systemd/system/ecs.service.
Job for ecs.service canceled.
Nov 28 20:56:28 cloud-init[3285]: util.py[WARNING]: Failed running /var/lib/cloud/instance/scripts/part-001 [1]
Nov 28 20:56:28 cloud-init[3285]: cc_scripts_user.py[WARNING]: Failed to run module scripts-user (scripts in /var/lib/cloud/instance/scripts)
Nov 28 20:56:28 cloud-init[3285]: util.py[WARNING]: Running module scripts-user (<module 'cloudinit.config.cc_scripts_user' from '/usr/lib/python2.7/site-packages/cloudinit/config/cc_scripts_user.pyc'>) failed
Cloud-init v. 18.2-72.amzn2.0.6 finished at Wed, 28 Nov 2018 20:56:28 +0000. Datasource DataSourceEc2. Up 377.68 seconds
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 4
- Comments: 31 (5 by maintainers)
Commits related to this issue
- Address issue with starting ecs agent in userdata scripts. - See Github Issue: https://github.com/aws/amazon-ecs-agent/issues/1707 — committed to onedownfiveup/indie-ninja-aws by cmavromoustakos 5 years ago
- Fix ecs.service and cloud-final.service deadlock dependency # https://github.com/aws/amazon-ecs-agent/issues/1707 — committed to daringcalf/packer-ecs by daringcalf 2 years ago
Hi,
Starting ecs this way via userdata will cause a deadlock in systemd’s startup scripts for docker and ecs.
The systemd units for both ecs and docker have a directive to wait for cloud-init to finish before starting. The cloud-init process isn’t considered finished until your userdata has finished running. So, requesting ecs (or docker) to start within userdata will cause this condition.
You should be able to fix this by adding a ‘–no-block’ flag:
systemctl enable --now --no-block ecs.service
Please let me know if you have any additional questions.
I found that the
systemctl enable --now
flag doesn’t work in current systemd version (219) for Amazon Linux 2 ECS AMI - see https://unix.stackexchange.com/questions/374280/the-now-switch-of-systemctl.Easiest way to fix this is as follows:
Or more directly (e.g. in a Packer script):
Hi guys, don’t you think that aws agent unit file for systemd should be corrected and line
After=cloud-final.service
removed from it?Hi all, following @philippefuentes suggestion I was able to adjust my user-data script (I’m also using Amazon Linux 2 ECS Optmimize ami).
I’m sharing my final user-data script for others facing similar trouble and by chance find this issue (until aws docs & post got updated I hope =) ).
Cheers!
Hi @petderek In such case one of the mentioned workarounds like
systemctl enable --now --no-block ecs.service
or update of unit file should be should be reflected in AWS ECS agent documentation page on AWS in part related to manual ECS agent installation. Systemd doesn’t show any errors or other notifications about deadlock dependency.The AWS docs really, really should mention
--no-block
for Amazon Linux2 AMI, because this is an absolutely ridic. issue to troubleshoot when following the instructions on the Amazon ECS doc pages. The Amazon ECS doc pages for ‘how to install and run the ecs cluster agent’ make the process seem trivial… but then you hit a race condition that only magically resolves itself if you land here and add the ‘no-block’ or youkill
thesystemctl
process and notice that all of a sudden the ecs-init process that follows it seems to make things work.lol do you just make your customers go around in circles following instructions that don’t work?
This is pretty embarrassing…
@dpavlov-smartling thank for your quick answer and sorry for not being specific enough, I indeed use ECS Optimized AMI
Unfortunately, it does not work for me, after ssh(ing) on a fresh new instance , the agent is not started using
systemctl enable --now --no-block ecs
Hello everyone, I am not currently aware of any ECS docs page that recommends running
systemctl start ecs
in userdata on the AL2 platform. We have this note in the docs page about installing the ecs container agent that I think clarifies this behavior, so I’m not sure exactly what more we can do to help with this issue (from https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-install.html):Are there any ECS docs that currently direct users to start ecs in AL2 userdata? If so please provide URLs and we can fix them ASAP.
Hi @petderek and @all I’m facing a similar trouble when I follow the steps described in this AWS blog post to install the rex-ray plugin in a ECS Optimized AMI (ie I’m not manually installing the ecs agent/service). When I do a curl looping to http://localhost:51678/v1/metadata in the user-data script to wait for the ecs service, it never stabilize (never starts) but without this loop it starts ok.
Could this be the same issue, and if so how should I circumvent it? should I manually start ecs using
systemctl enable --now --no-block ecs
in my user-data script?The only problem with this is that it potentially breaks another use case. The reason we have the
cloud-final.service
is so that a user can modify ecs configuration as part of the userdata script. Our documentation has examples like this:Since we are guaranteed that userdata is completely processed before agent starts, no further systemd configuration is required in order to ensure that agent receives the intended values.
I’m thinking that the best path for the optimized AMI is to leave the current configuration as is for general use. The workarounds include:
systemctl enable --now --no-block ecs.service
if you need ecs to be available as part of your userdata script.