aws-cdk: Batch and UserData synths ok but fails deploy with: "Operation failed, ComputeEnvironment went INVALID with error: CLIENT_ERROR - Launch Template UserData is not MIME multipart format"
The context for this issue is well described in the Gitter aws-cdk channel over here, along with the tests performed by @skinny85, @reisingerf and myself:
https://gitter.im/awslabs/aws-cdk?at=5e54579d9aeef6523217b25f
Reproduction Steps
Given the following snippet, as described in the Gitter thread:
const vpc = ec2.Vpc.fromLookup(this, 'Vpc', {
isDefault: true,
});
const batch_instance_role = new iam.Role(this, 'BatchInstanceRole', {
roleName: 'UmccriseBatchInstanceRole',
assumedBy: new iam.CompositePrincipal(
new iam.ServicePrincipal('ec2.amazonaws.com'),
new iam.ServicePrincipal('ecs.amazonaws.com'),
),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2RoleforSSM'),
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2ContainerServiceforEC2Role')
],
});
const spotfleet_role = new iam.Role(this, 'AmazonEC2SpotFleetRole', {
assumedBy: new iam.ServicePrincipal('spotfleet.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2SpotFleetTaggingRole'),
],
});
const batch_service_role = new iam.Role(this, 'BatchServiceRole', {
assumedBy: new iam.ServicePrincipal('batch.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSBatchServiceRole'),
],
});
const batch_instance_profile = new iam.CfnInstanceProfile(this, 'BatchInstanceProfile', {
instanceProfileName: 'UmccriseBatchInstanceProfile',
roles: [batch_instance_role.roleName],
});
const launch_template = new ec2.CfnLaunchTemplate(this, 'LaunchTemplate', {
launchTemplateData: {
userData: core.Fn.base64(`
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
echo Hello
--==MYBOUNDARY==--
`),
},
launchTemplateName: 'UmccriseBatchComputeLaunchTemplate',
});
new batch.CfnComputeEnvironment(this, 'UmccriseBatchComputeEnv', {
type: 'MANAGED',
serviceRole: batch_service_role.roleArn,
computeResources: {
type: 'SPOT',
maxvCpus: 128,
minvCpus: 0,
desiredvCpus: 0,
imageId: 'ami-05c621ca32de56e7a',
launchTemplate: {
launchTemplateId: launch_template.ref,
version: launch_template.attrLatestVersionNumber,
},
spotIamFleetRole: spotfleet_role.roleArn,
instanceRole: batch_instance_profile.instanceProfileName!,
instanceTypes: ['optimal'],
subnets: [vpc.publicSubnets[0].subnetId],
securityGroupIds: ['sg-0a5cf974'],
tags: { 'Creator': 'Batch' },
}
});
For more context, there’s this other working example too:
https://github.com/awslabs/aws-batch-helpers/issues/5#issue-425133706
Error Log
This:
Operation failed, ComputeEnvironment went INVALID with error: CLIENT_ERROR - Launch Template UserData is not MIME multipart format
Coupled with the deploy time error (in Python, ask @skinny85 for the TypeScript counterpart):
6/10 | 10:01:20 AM | UPDATE_FAILED | AWS::Batch::ComputeEnvironment | UmccriseBatchComputeEnv Operation failed, ComputeEnvironment went INVALID with error: CLIENT_ERROR - Launch Template UserData is not MIME multipart format
/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7838:49
\_ Kernel._wrapSandboxCode (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:8298:20)
\_ Kernel._create (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7838:26)
\_ Kernel.create (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7585:21)
\_ KernelHost.processRequest (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7372:28)
\_ KernelHost.run (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7312:14)
\_ Immediate._onImmediate (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7315:37)
\_ processImmediate (internal/timers.js:456:21)
Environment
- CLI Version : 1.25.0 (build 5ced526)
- Framework Version: 1.25.0 (build 5ced526)
- OS : MacOS Catalina 10.15.3
- Language : Python and Typescript
This is 🐛 Bug Report
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 15 (6 by maintainers)
After much playing around I’ve managed to see a workaround for a python deployment. Reading in the user data from a file is better than using a multi-line string.
Part 1: Read in the user data
Part 2: Assign as a Userdata object with the custom method
Part 3: Add to launch_template_data dict
The render magically gets rid of the
linesattribute in the stack and the base64 re-encodes it as appropriatePart 4: Initialise the launch template
Part 5: Override the launch template property
Adding in userdata in the previous step under the kwarg
launch_template_datadoesn’t seem to work so we override the property using theadd_property_overrideinsteadPart 6: Validate
Our launch template should look like this after running
cdk synth@reisingerf You are right.
I mistakenly assumed that
$Latestis used as a pointer, and batch does whatever changes needed to its own launch template.I now understand this is not the case and actually
$Latestindeed does not differ fromlaunch_template.attrLatestVersionNumber.I even tried updating the managed template myself, in the hopes that
$Latestperhaps refers to its own managed latest, but that didn’t work either.You mentioned that:
If you update the launch template from the CDK app, it should have also caused a re-creation of the compute environment because
launch_template.attrLatestVersionNumbernow evaluates to a different value, and according to this, CloudFormation would replace the compute environment and the changes should apply.Can you double check the compute environment was indeed replaced? Note that if the environment is used by some queue (which it usually is), the replacement will fail and actually result in two environments, one pointing to the old template version, and one to the new, with the queue still connected to the old one.
It looks like the experience of updating a launch template isn’t tight enough and has a few problems, both from the CloudFormation and the CDK side. I’ll try to think how can we improve on that (a feature request from you will be appreciated as well 😃).
In the meanwhile, the safest and most streamlined approach (all be it somewhat slow), would be to create a
batch.CfnJobQueuein the CDK app and runcdk destroy && cdk deployeach time you change the launch template.