amazon-ecs-agent: ECS failing to execute upstart-job from user data

Summary

EC2 User Data not being properly parsed, resulting in ECS Task not starting

Description

We are attempting to upgrade to the ECS Optimized Amazon Linux 2 AIM (from version 1) and our ECS task is not getting started. We have our EC2 user data as multi-part MIME encoded text as outlined in the AWS documentation. The upstart-job part does not appear to be executed as this is what starts our task (which is not starting), and there is no output from the script contained in this upstart-job. Looking in the logs at /var/log/ecs/ we see this output:

2018-12-12T22:23:59Z [INFO] Loading configuration
2018-12-12T22:23:59Z [INFO] Unable to parse user data: invalid character 'C' looking for beginning of value
2018-12-12T22:23:59Z [INFO] Amazon ECS agent Version: 1.22.0, Commit: 26518174

which, looking at the source, would indicate this is trying to interpret the user data as JSON, which it is not. I’ve been unable to find documentation describing a differing format for the user data than that of the multipart MIME format.

Here’s our user data (truncated):

Content-Type: multipart/mixed; boundary="==BOUNDARY=="
MIME-Version: 1.0

--==BOUNDARY==
Content-Type: text/upstart-job; charset="us-ascii"

#upstart-job
description "Amazon EC2 Container Service (start task on instance boot)"
author "Us"
start on started ecs

script
	mkdir -p /var/log/startup
	exec 1>>/var/log/startup/ecs-start-task.log 2>&1
	set -x
	until curl -s http://localhost:51678/v1/metadata
	do
		sleep 1
	done

        [snip]

	aws ecs start-task --cluster "${ECS_CLUSTER}" --task-definition "${ECS_TASK_DEFINITION}" --container-instances "${AWS_INSTANCE_ARN}" --started-by "${AWS_INSTANCE_ARN}" --region "${AWS_REGION}" --overrides "${CONTAINER_OVERRIDES}"
	
	echo "End EC2 task startup script: $(date)"
end script

Expected Behavior

We expect the user data to be properly parsed and executed.

Observed Behavior

The ECS Task fails to start since the upstart-job fails to execute.

Environment Details

[ec2-user@ip-172-31-0-92 ecs]$ curl http://localhost:51678/v1/metadata
{"Cluster":"default","ContainerInstanceArn":"arn:aws:ecs:us-east-1:760528713078:container-instance/46d67ed6-44f9-42be-8998-87ab9baf4d2c","Version":"Amazon ECS Agent - v1.22.0 (26518174)"}

Supporting Log Snippets

[ec2-user@ip-172-31-0-92 ecs]$ cat ecs-agent.log.2018-12-12-22 
2018-12-12T22:23:59Z [INFO] Loading configuration
2018-12-12T22:23:59Z [INFO] Unable to parse user data: invalid character 'C' looking for beginning of value
2018-12-12T22:23:59Z [INFO] Amazon ECS agent Version: 1.22.0, Commit: 26518174
2018-12-12T22:23:59Z [INFO] Creating root ecs cgroup: /ecs
2018-12-12T22:23:59Z [INFO] Creating cgroup /ecs
2018-12-12T22:23:59Z [INFO] Loading state! module="statemanager"
2018-12-12T22:23:59Z [INFO] Event stream ContainerChange start listening...
2018-12-12T22:23:59Z [INFO] Registering Instance with ECS
2018-12-12T22:23:59Z [INFO] Remaining mem: 985
2018-12-12T22:23:59Z [INFO] Registered container instance with cluster!
2018-12-12T22:23:59Z [INFO] Registration completed successfully. I am running as 'arn:aws:ecs:us-east-1:760528713078:container-instance/46d67ed6-44f9-42be-8998-87ab9baf4d2c' in cluster 'default'
2018-12-12T22:23:59Z [INFO] Saving state! module="statemanager"
2018-12-12T22:23:59Z [INFO] Beginning Polling for updates
2018-12-12T22:23:59Z [INFO] Event stream DeregisterContainerInstance start listening...
2018-12-12T22:23:59Z [INFO] Initializing stats engine
2018-12-12T22:23:59Z [INFO] Establishing a Websocket connection to https://ecs-a-2.us-east-1.amazonaws.com/ws?agentHash=26518174&agentVersion=1.22.0&clusterArn=default&containerInstanceArn=arn%3Aaws%3Aecs%3Aus-east-1%3A760528713078%3Acontainer-instance%2F46d67ed6-44f9-42be-8998-87ab9baf4d2c&dockerVersion=DockerVersion%3A+18.06.1-ce&sendCredentials=true&seqNum=1
2018-12-12T22:23:59Z [INFO] NO_PROXY set:169.254.169.254,169.254.170.2,/var/run/docker.sock
2018-12-12T22:23:59Z [INFO] Establishing a Websocket connection to https://ecs-t-2.us-east-1.amazonaws.com/ws?cluster=default&containerInstance=arn%3Aaws%3Aecs%3Aus-east-1%3A760528713078%3Acontainer-instance%2F46d67ed6-44f9-42be-8998-87ab9baf4d2c
2018-12-12T22:23:59Z [INFO] Connected to ACS endpoint
2018-12-12T22:23:59Z [INFO] Connected to TCS endpoint
2018-12-12T22:24:09Z [INFO] Saving state! module="statemanager"

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

For anyone suffering from a lack of documentation, I’ve managed to get a working solution. Without updates to the official docs it’s unclear if this is the “correct” way to launch ECS tasks from EC2 user data, but I hope this helps illustrate what I am trying to do, and perhaps helps someone in a similar situation.

This is my rewrite of the user data samples given at https://docs.aws.amazon.com/AmazonECS/latest/developerguide/start_task_at_launch.html and https://docs.aws.amazon.com/AmazonECS/latest/developerguide/bootstrap_container_instance.html such that instead of a MIME part for upstart we write out a systemd service unit which executes a bash script, similar to what the examples in the aforementioned documentation do, to start the ECS task.

NOTE: Edited March 12, 2019 to include workarounds for issues which prevented the originally posted user data from operating as intended. Specifically, a workaround for #1707 and a wait loop to wait for the ecs service to be responsive (even though the systemd ecs.service has started).

NOTE: Edited March 13, 2019 to further refine workaround for #1707. Specifically we can avoid the need to edit the ecs.service and simply use --no-block when starting our service so the user data script exits and allows the system to boot.

Content-Type: multipart/mixed; boundary="==BOUNDARY=="
MIME-Version: 1.0

--==BOUNDARY==
Content-Type: text/cloud-boothook; charset="us-ascii"

# Set Docker daemon options
cloud-init-per once docker_options echo 'OPTIONS="${OPTIONS} --foo bar"' >> /etc/sysconfig/docker

--==BOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"

#!/bin/bash
# Specify the cluster that the container instance should register into
cluster=your_cluster_name

# Write the cluster configuration variable to the ecs.config file
# (add any other configuration variables here also)
echo ECS_CLUSTER=$cluster >> /etc/ecs/ecs.config

# Install the AWS CLI and the jq JSON parser
yum install -y aws-cli jq

START_TASK_SCRIPT_FILE="/etc/ecs/ecs-start-task.sh"
cat <<- 'EOF' > ${START_TASK_SCRIPT_FILE}
	exec 2>>/var/log/ecs/ecs-start-task.log
	set -x
	# Wait for the ecs service to be responsive
	until curl -s http://localhost:51678/v1/metadata
	do
		sleep 1
	done

	# Grab the container instance ARN and AWS region from instance metadata
	instance_arn=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F/ '{print $NF}' )
	cluster=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .Cluster' | awk -F/ '{print $NF}' )
	region=$(curl -s http://localhost:51678/v1/metadata | jq -r '. | .ContainerInstanceArn' | awk -F: '{print $4}')

	# Specify the task definition to run at launch
	task_definition=my_task_def

	# Run the AWS CLI start-task command to start your task on this container instance
	aws ecs start-task --cluster $cluster --task-definition $task_definition --container-instances $instance_arn --started-by $instance_arn --region $region
EOF

# Write systemd unit file
UNIT="ecs-start-task.service"
cat <<- EOF > /etc/systemd/system/${UNIT}
	[Unit]
	Description=ECS Start Task
	Requires=ecs.service
	After=ecs.service

	[Service]
	Restart=always
	ExecStart=/usr/bin/bash ${START_TASK_SCRIPT_FILE}

	[Install]
	WantedBy=default.target
EOF

# Enable our ecs.service dependent service with `--no-block` to prevent systemd deadlock
# See https://github.com/aws/amazon-ecs-agent/issues/1707
systemctl enable --now --no-block "${UNIT}"
--==BOUNDARY==--

@levigroker

Glad that you got it working and Thanks for sharing the info with everyone.

Thanks for the feedback, I am working with the internal team on this.

Thanks for posting your solution @levigroker!

AWS Team: Can you please update this official documentation about starting a task at instance launch?

It’s currently misleading as the method specified there does not work for Amazon Linux 2, while customers are actively being encouraged to switch to the new OS version. We learned about this the hard way when things started breaking after upgrading the AMI.