amazon-ecs-agent: ECS failing to execute upstart-job from user data
Summary
EC2 User Data not being properly parsed, resulting in ECS Task not starting
Description
We are attempting to upgrade to the ECS Optimized Amazon Linux 2 AIM (from version 1) and our ECS task is not getting started. We have our EC2 user data as multi-part MIME encoded text as outlined in the AWS documentation. The upstart-job part does not appear to be executed as this is what starts our task (which is not starting), and there is no output from the script contained in this upstart-job. Looking in the logs at /var/log/ecs/
we see this output:
2018-12-12T22:23:59Z [INFO] Loading configuration
2018-12-12T22:23:59Z [INFO] Unable to parse user data: invalid character 'C' looking for beginning of value
2018-12-12T22:23:59Z [INFO] Amazon ECS agent Version: 1.22.0, Commit: 26518174
which, looking at the source, would indicate this is trying to interpret the user data as JSON, which it is not. I’ve been unable to find documentation describing a differing format for the user data than that of the multipart MIME format.
Here’s our user data (truncated):
Content-Type: multipart/mixed; boundary="==BOUNDARY=="
MIME-Version: 1.0
--==BOUNDARY==
Content-Type: text/upstart-job; charset="us-ascii"
#upstart-job
description "Amazon EC2 Container Service (start task on instance boot)"
author "Us"
start on started ecs
script
mkdir -p /var/log/startup
exec 1>>/var/log/startup/ecs-start-task.log 2>&1
set -x
until curl -s http://localhost:51678/v1/metadata
do
sleep 1
done
[snip]
aws ecs start-task --cluster "${ECS_CLUSTER}" --task-definition "${ECS_TASK_DEFINITION}" --container-instances "${AWS_INSTANCE_ARN}" --started-by "${AWS_INSTANCE_ARN}" --region "${AWS_REGION}" --overrides "${CONTAINER_OVERRIDES}"
echo "End EC2 task startup script: $(date)"
end script
Expected Behavior
We expect the user data to be properly parsed and executed.
Observed Behavior
The ECS Task fails to start since the upstart-job fails to execute.
Environment Details
[ec2-user@ip-172-31-0-92 ecs]$ curl http://localhost:51678/v1/metadata
{"Cluster":"default","ContainerInstanceArn":"arn:aws:ecs:us-east-1:760528713078:container-instance/46d67ed6-44f9-42be-8998-87ab9baf4d2c","Version":"Amazon ECS Agent - v1.22.0 (26518174)"}
Supporting Log Snippets
[ec2-user@ip-172-31-0-92 ecs]$ cat ecs-agent.log.2018-12-12-22
2018-12-12T22:23:59Z [INFO] Loading configuration
2018-12-12T22:23:59Z [INFO] Unable to parse user data: invalid character 'C' looking for beginning of value
2018-12-12T22:23:59Z [INFO] Amazon ECS agent Version: 1.22.0, Commit: 26518174
2018-12-12T22:23:59Z [INFO] Creating root ecs cgroup: /ecs
2018-12-12T22:23:59Z [INFO] Creating cgroup /ecs
2018-12-12T22:23:59Z [INFO] Loading state! module="statemanager"
2018-12-12T22:23:59Z [INFO] Event stream ContainerChange start listening...
2018-12-12T22:23:59Z [INFO] Registering Instance with ECS
2018-12-12T22:23:59Z [INFO] Remaining mem: 985
2018-12-12T22:23:59Z [INFO] Registered container instance with cluster!
2018-12-12T22:23:59Z [INFO] Registration completed successfully. I am running as 'arn:aws:ecs:us-east-1:760528713078:container-instance/46d67ed6-44f9-42be-8998-87ab9baf4d2c' in cluster 'default'
2018-12-12T22:23:59Z [INFO] Saving state! module="statemanager"
2018-12-12T22:23:59Z [INFO] Beginning Polling for updates
2018-12-12T22:23:59Z [INFO] Event stream DeregisterContainerInstance start listening...
2018-12-12T22:23:59Z [INFO] Initializing stats engine
2018-12-12T22:23:59Z [INFO] Establishing a Websocket connection to https://ecs-a-2.us-east-1.amazonaws.com/ws?agentHash=26518174&agentVersion=1.22.0&clusterArn=default&containerInstanceArn=arn%3Aaws%3Aecs%3Aus-east-1%3A760528713078%3Acontainer-instance%2F46d67ed6-44f9-42be-8998-87ab9baf4d2c&dockerVersion=DockerVersion%3A+18.06.1-ce&sendCredentials=true&seqNum=1
2018-12-12T22:23:59Z [INFO] NO_PROXY set:169.254.169.254,169.254.170.2,/var/run/docker.sock
2018-12-12T22:23:59Z [INFO] Establishing a Websocket connection to https://ecs-t-2.us-east-1.amazonaws.com/ws?cluster=default&containerInstance=arn%3Aaws%3Aecs%3Aus-east-1%3A760528713078%3Acontainer-instance%2F46d67ed6-44f9-42be-8998-87ab9baf4d2c
2018-12-12T22:23:59Z [INFO] Connected to ACS endpoint
2018-12-12T22:23:59Z [INFO] Connected to TCS endpoint
2018-12-12T22:24:09Z [INFO] Saving state! module="statemanager"
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 17 (7 by maintainers)
For anyone suffering from a lack of documentation, I’ve managed to get a working solution. Without updates to the official docs it’s unclear if this is the “correct” way to launch ECS tasks from EC2 user data, but I hope this helps illustrate what I am trying to do, and perhaps helps someone in a similar situation.
This is my rewrite of the user data samples given at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/start_task_at_launch.html and https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bootstrap_container_instance.html such that instead of a MIME part for upstart we write out a systemd service unit which executes a bash script, similar to what the examples in the aforementioned documentation do, to start the ECS task.
NOTE: Edited March 12, 2019 to include workarounds for issues which prevented the originally posted user data from operating as intended. Specifically, a workaround for #1707 and a wait loop to wait for the ecs service to be responsive (even though the systemd ecs.service has started).
NOTE: Edited March 13, 2019 to further refine workaround for #1707. Specifically we can avoid the need to edit the ecs.service and simply use
--no-block
when starting our service so the user data script exits and allows the system to boot.@levigroker
Glad that you got it working and Thanks for sharing the info with everyone.
Thanks for the feedback, I am working with the internal team on this.
Thanks for posting your solution @levigroker!
AWS Team: Can you please update this official documentation about starting a task at instance launch?
It’s currently misleading as the method specified there does not work for Amazon Linux 2, while customers are actively being encouraged to switch to the new OS version. We learned about this the hard way when things started breaking after upgrading the AMI.