aws-cdk: Batch and UserData synths ok but fails deploy with: "Operation failed, ComputeEnvironment went INVALID with error: CLIENT_ERROR - Launch Template UserData is not MIME multipart format"

The context for this issue is well described in the Gitter aws-cdk channel over here, along with the tests performed by @skinny85, @reisingerf and myself:

https://gitter.im/awslabs/aws-cdk?at=5e54579d9aeef6523217b25f

Reproduction Steps

Given the following snippet, as described in the Gitter thread:

const vpc = ec2.Vpc.fromLookup(this, 'Vpc', {
            isDefault: true,
        });

        const batch_instance_role = new iam.Role(this, 'BatchInstanceRole', {
            roleName: 'UmccriseBatchInstanceRole',
            assumedBy: new iam.CompositePrincipal(
                new iam.ServicePrincipal('ec2.amazonaws.com'),
                new iam.ServicePrincipal('ecs.amazonaws.com'),
            ),
            managedPolicies: [
                iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2RoleforSSM'),
                iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2ContainerServiceforEC2Role')
            ],
        });
        const spotfleet_role = new iam.Role(this, 'AmazonEC2SpotFleetRole', {
            assumedBy: new iam.ServicePrincipal('spotfleet.amazonaws.com'),
            managedPolicies: [
                iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AmazonEC2SpotFleetTaggingRole'),
            ],
        });
        const batch_service_role = new iam.Role(this, 'BatchServiceRole', {
            assumedBy: new iam.ServicePrincipal('batch.amazonaws.com'),
            managedPolicies: [
                iam.ManagedPolicy.fromAwsManagedPolicyName('service-role/AWSBatchServiceRole'),
            ],
        });
        const batch_instance_profile = new iam.CfnInstanceProfile(this, 'BatchInstanceProfile', {
            instanceProfileName: 'UmccriseBatchInstanceProfile',
            roles: [batch_instance_role.roleName],
        });

        const launch_template = new ec2.CfnLaunchTemplate(this, 'LaunchTemplate', {
            launchTemplateData: {
                userData: core.Fn.base64(`
                    MIME-Version: 1.0
                    Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

                    --==MYBOUNDARY==
                    Content-Type: text/x-shellscript; charset="us-ascii"

                    #!/bin/bash          
                    echo Hello

                    --==MYBOUNDARY==--
                `),
            },
            launchTemplateName: 'UmccriseBatchComputeLaunchTemplate',
        });
        new batch.CfnComputeEnvironment(this, 'UmccriseBatchComputeEnv', {
            type: 'MANAGED',
            serviceRole: batch_service_role.roleArn,
            computeResources: {
                type: 'SPOT',
                maxvCpus: 128,
                minvCpus: 0,
                desiredvCpus: 0,
                imageId: 'ami-05c621ca32de56e7a',
                launchTemplate: {
                    launchTemplateId: launch_template.ref,
                    version: launch_template.attrLatestVersionNumber,
                },
                spotIamFleetRole: spotfleet_role.roleArn,
                instanceRole: batch_instance_profile.instanceProfileName!,
                instanceTypes: ['optimal'],
                subnets: [vpc.publicSubnets[0].subnetId],
                securityGroupIds: ['sg-0a5cf974'],
                tags: { 'Creator': 'Batch' },
            }
        });

For more context, there’s this other working example too:

https://github.com/awslabs/aws-batch-helpers/issues/5#issue-425133706

Error Log

This:

Operation failed, ComputeEnvironment went INVALID with error: CLIENT_ERROR - Launch Template UserData is not MIME multipart format

Coupled with the deploy time error (in Python, ask @skinny85 for the TypeScript counterpart):

 6/10 | 10:01:20 AM | UPDATE_FAILED        | AWS::Batch::ComputeEnvironment        | UmccriseBatchComputeEnv Operation failed, ComputeEnvironment went INVALID with error: CLIENT_ERROR - Launch Template UserData is not MIME multipart format
        /Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7838:49
        \_ Kernel._wrapSandboxCode (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:8298:20)
        \_ Kernel._create (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7838:26)
        \_ Kernel.create (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7585:21)
        \_ KernelHost.processRequest (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7372:28)
        \_ KernelHost.run (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7312:14)
        \_ Immediate._onImmediate (/Users/romanvg/.miniconda3/envs/cdk/lib/python3.7/site-packages/jsii/_embedded/jsii/jsii-runtime.js:7315:37)
        \_ processImmediate (internal/timers.js:456:21)

Environment

  • CLI Version : 1.25.0 (build 5ced526)
  • Framework Version: 1.25.0 (build 5ced526)
  • OS : MacOS Catalina 10.15.3
  • Language : Python and Typescript

This is 🐛 Bug Report

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

After much playing around I’ve managed to see a workaround for a python deployment. Reading in the user data from a file is better than using a multi-line string.

Part 1: Read in the user data

with open("user_data/user_data.txt", 'r') as user_data_h:
            user_data = user_data_h.read()

Part 2: Assign as a Userdata object with the custom method

user_init = ec2.UserData.custom(user_data)

Part 3: Add to launch_template_data dict

The render magically gets rid of the lines attribute in the stack and the base64 re-encodes it as appropriate

launch_template_data = {
     "UserData": core.Fn.base64(user_init.render())
}

Part 4: Initialise the launch template

launch_template = ec2.CfnLaunchTemplate(self, "LaunchTemplate", launch_template_name="UmccriseBatchComputeLaunchTemplateDev")

Part 5: Override the launch template property

Adding in userdata in the previous step under the kwarg launch_template_data doesn’t seem to work so we override the property using the add_property_override instead

launch_template.add_property_override("LaunchTemplateData", launch_template_data)

Part 6: Validate

Our launch template should look like this after running cdk synth

LaunchTemplate:
    Type: AWS::EC2::LaunchTemplate
    Properties:
      LaunchTemplateData:
        UserData:
          Fn::Base64: >-
            MIME-Version: 1.0

            Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="


            --==MYBOUNDARY==

            Content-Type: text/x-shellscript; charset="us-ascii"


            #!/bin/bash

            echo Hello


            --==MYBOUNDARY==--

@reisingerf You are right.

I mistakenly assumed that $Latest is used as a pointer, and batch does whatever changes needed to its own launch template.

I now understand this is not the case and actually $Latest indeed does not differ from launch_template.attrLatestVersionNumber.

I even tried updating the managed template myself, in the hopes that $Latest perhaps refers to its own managed latest, but that didn’t work either.

You mentioned that:

I also noticed (after quite some frustration) that my updates to my user data, seemed to be deployed, but actually did not change the user data run by the starting instances.

If you update the launch template from the CDK app, it should have also caused a re-creation of the compute environment because launch_template.attrLatestVersionNumber now evaluates to a different value, and according to this, CloudFormation would replace the compute environment and the changes should apply.

Can you double check the compute environment was indeed replaced? Note that if the environment is used by some queue (which it usually is), the replacement will fail and actually result in two environments, one pointing to the old template version, and one to the new, with the queue still connected to the old one.

It looks like the experience of updating a launch template isn’t tight enough and has a few problems, both from the CloudFormation and the CDK side. I’ll try to think how can we improve on that (a feature request from you will be appreciated as well 😃).

In the meanwhile, the safest and most streamlined approach (all be it somewhat slow), would be to create a batch.CfnJobQueue in the CDK app and run cdk destroy && cdk deploy each time you change the launch template.