aws-cdk: aws-ecs: wrong cfnspec for EFS volume configuration (casing issue)

CDK currently generates two resources with casing mistakes when EFSVolumeConfiguration is specified for task definitions.

EfsVolumeConfiguration should be EFSVolumeConfiguration

FileSystemId should be FilesystemId

The corresponding test is also wrong: https://github.com/aws/aws-cdk/blob/23986d70c5cd69ce212b5ffdc1bcf059f438f15b/packages/%40aws-cdk/aws-ecs/test/ec2/ec2-task-definition.test.ts#L1192

Reproduction Steps

The following (incomplete) snippet recreates the issue:

      const efs = new FileSystem(this, 'etcdata', {
        vpc: service.cluster.vpc,
      });

      taskDefinition.addVolume({
        name: 'etcdata',
        efsVolumeConfiguration: {
          fileSystemId: efs.fileSystemId
        },
      });

This is based on the ApplicationLoadBalancedFargateService pattern.

The above generates this CFN JSON:

        "Volumes": [
          {
            "Name": "etcdata",
            "EfsVolumeConfiguration": {
              "FileSystemId": {
                "Ref": "etcdata80702D7D"
              }
            }
          }
        ]

Which then yields CFN errors:

message: #/Volumes/0: extraneous key [EfsVolumeConfiguration] is not permitted

And if I fix the above to use the correct case manually, then later it uncovers the 2nd problem:

 #/Volumes/0/EFSVolumeConfiguration: required key [FilesystemId] not found

Fixing both manually causes the template to be deployed successfully.

Environment

  • CDK CLI Version : 1.107.0 (build 52c4434)
  • Framework Version: ??
  • Node.js Version: v16.2.0
  • OS : OSX
  • **Language (Version): typescript 4.3.2 -->

This is 🐛 Bug Report

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 22
  • Comments: 31 (5 by maintainers)

Commits related to this issue

Most upvoted comments

Still not solved in v1.116.0 😢

Any update on this? Bug still there with CDK v2.8

TLDR: This validation message is a warning, if you’re stack is failing to deploy it’s because of another problem. If it’s unclear in the cloudformation console check out ECS console and look at your stopped tasks during the stack deploy to find the error messaging. We will fix this in 2.0!

So we know about the incorrect casing on these properties, but it doesn’t cause the stack deploy to fail. When reproducing this I got this message in the CFN console as well, but it didn’t fail my deploy. What is most likely happening (what happened to me and took me a bit to figure out) is that you’re missing the fs.connections.allowDefaultPortFrom(service); which causes your tasks to fail the healthcheck with ResourceInitializationError: failed to invoke EFS utils. It will keep redeploying these containers for like 45 minutes before timing out and the stack deploy fails. You can look in the ECS console during the deploy and find the service, then click the tasks tab, then click the stopped filter. If you have stopped tasks there, you can click on them to see the details and this is where you’ll see the error.

Screen Shot 2021-06-09 at 7 22 18 PM

Here is a minimal stack that I was able to deploy.

Click Me!
export class Issue15025Stack extends cdk.Stack {
  constructor(scope: cdk.Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const vpc = new ec2.Vpc(this, "Vpc");
    const cluster = new ecs.Cluster(this, "Cluster", {
      vpc,
    });
    const fs = new efs.FileSystem(this, "FileSystem", {
      vpc,
    });

    const taskDefinition = new ecs.TaskDefinition(this, "TaskDefinition", {
      cpu: "256",
      memoryMiB: "512",
      compatibility: ecs.Compatibility.FARGATE,
    });

    taskDefinition.addContainer("Container", {
      image: ecs.ContainerImage.fromAsset(
        path.resolve(__dirname, "..", "container")
      ),
    });

    taskDefinition.addVolume({
      name: "TaskFs",
      efsVolumeConfiguration: {
        fileSystemId: fs.fileSystemId,
      },
    });

    const service = new ecs.FargateService(this, "Service", {
      cluster,
      taskDefinition,
    });

    fs.connections.allowDefaultPortFrom(service);
    // The code that defines your stack goes here
  }
}

Long story, but what happened is a contributor edited the CloudFormation spec manually to add these properties so they could use EFS with ECS. We ingest the new cloudformation spec once a week, but sometimes there are bugs that cause this to take a little longer and without knowing that this was automated, they made this change to unblock themselves and we mistakenly accepted the PR. This is where the properties with the bad casing were ingested. https://github.com/aws/aws-cdk/pull/8467

Tests showed that stacks deployed correctly with this initial PR.

Then, a couple weeks later when we ingested the new spec with these properties included, we saw that they were in fact incorrectly cased. So if we changed the casing, this would break users who already had stacks deployed using this casing. “Break” as in break backwards compatibility because they would have to update their stack code to use the correct property names. Instead we patched the CFN spec so we wouldn’t break users.

Regardless we should fix this, and we have a v2.0 in progress that gives us the opportunity to do so. I’m keeping this open for @SoManyHs and @madeline-k to track removing this patch for 2.0.

A hack approach to remove those warnings

        /** max hackiness to handle https://github.com/aws/aws-cdk/issues/15025 */
        const volumes = [optSonarDataVolume, optSonarTempVolume, optSonarExtVolume, optSonarLogVolume]
        const cfnTD = (this.fargate.taskDefinition.node.defaultChild as CfnTaskDefinition);

        for (let i = 0; i < volumes.length; i++) {
            this.fargate.taskDefinition.addVolume(volumes[i])
            const { fileSystemId, transitEncryption, rootDirectory, ...rest } = volumes[i].efsVolumeConfiguration!
            cfnTD.addPropertyOverride(`Volumes.${i}.EFSVolumeConfiguration`, {
                ...rest,
                FilesystemId: fileSystemId,
                TransitEncryption: transitEncryption,
                RootDirectory: rootDirectory
            });
            cfnTD.addPropertyDeletionOverride(`Volumes.${i}.EfsVolumeConfiguration`);
        }
        /** end max hackiness */

I have also experienced this bug with the same symptoms and reproducibility. This is a significant issue for those needing to mount EFS volumes to a garbage container.

One month later this is still not fixed. Can we expect a solution?

Still present in 2.24.1

I can confirm that the bug is still there with CDV v2.12

Me too…

Chiming in that for me the error message did not cause any problems and EFS still mounted correctly to my fargate container.

Here’s a gist https://gist.github.com/jeznag/623132d2a98eff5392acd702702f1ef7

The fix involves a change like this - https://github.com/aws/aws-cdk/commit/dd308b6610a670f0dd91df2476253cffb4cb04a3 - but behind a feature flag to avoid resource replacements on stacks where this already works.

@MrArnoldPalmer so OK, maybe there is another error but it’s got to be EFS-related, because if I replace efsVolumeConfiguration() with host() my stack deploys just fine.

But I haven’t been able to find that specific error yet.