aws-cdk: Dependency stacks cause update failure

Note: for support questions, please first reference our documentation, then use Stackoverflow. This repository’s issues are intended for feature requests and bug reports.

  • I’m submitting a …

    • 🪲 bug report
    • 🚀 feature request
    • 📚 construct library gap
    • ☎️ security issue or vulnerability => Please see policy
    • ❓ support request => Please see note at the top of this template.
  • What is the current behavior? If the current behavior is a 🪲bug🪲: Please provide the steps to reproduce

stack_1 = FirstStack(
    app=app,
    id='FirstStack
)

stack_2 = SecondStack(
    app=app,
    id='SecondStack',
    construct_from_stack_1=stack1.some_construct
)

This causes a dependency via stack output. When I decide not to use construct_from_stack_1 anymore (by deleting its usage from stack_2), the stack_2 fails to update - for instance:

eks-dev
eks-dev: deploying...
eks-dev: creating CloudFormation changeset...
 0/1 | 12:13:45 | UPDATE_ROLLBACK_IN_P | AWS::CloudFormation::Stack              | eks-dev Export eks-dev:ExportsOutputFnGetAttEksElasticLoadBalancer4FCBC5E7SourceSecurityGroupOwnerAlias211654CC cannot be deleted as it is in use by ports-assignment-dev

 ❌  eks-dev failed: Error: The stack named eks-dev is in a failed state: UPDATE_ROLLBACK_COMPLETE
The stack named eks-dev is in a failed state: UPDATE_ROLLBACK_COMPLETE

Looks like CDK tries to delete resource in wrong order - starting from the output first rather than its usage in dependent stacks and then from the souce stack itself.

  • What is the expected behavior (or behavior of feature suggested)? Update removes resources that are no longer used

  • What is the motivation / use case for changing the behavior or adding this feature?

Life-time dependecies are created which prevents dependent stacks from being updated.

  • Please tell us about your environment:

    • CDK CLI Version: 1.0.0
    • Module Version: 1.0.0
    • OS: [all]
    • Language: [all ]
  • Other information (e.g. detailed explanation, stacktraces, related issues, suggestions how to fix, links for us to have context, eg. associated pull-request, stackoverflow, gitter, etc)

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 53
  • Comments: 16 (2 by maintainers)

Most upvoted comments

This bug is caused by the automatic dependency resolution mechanism in the CDK CLI, which means that when you update stack_2, it will automatically update stack_1 first (which obviously fails as stack_2 is still using the exported resource). The solution is really simple - just say cdk deploy -e stack_2, which will update only stack_2, and afterwards you can say cdk deploy stack_1 to clean up the unused export.

This will fail if you at the same time add something to stack_1 that is needed by stack_2 - in this case, stack_2 cannot be updated first, but neither can stack_1 because of the export. This is an obvious limitation with CloudFormation that has nothing to do with CDK, and the simplest way to avoid this is to just to do smaller changes.

The proper way to solve all problems like this is to use NestedStack instead of Stack. Automated support for that landed in 1.12.0, and that allows CloudFormation to handle this case correctly - first by creating all new resources in all stacks in dependency order, then updating all references and then only finally doing a pass to remove all the replaced resources.

Not sure what should actually be done about this in CDK - one solution would be to just add a note when a stack update fails to an export being used that “perhaps try updating stack with --exclusively”.

As @nakedible said, one of the workarounds is splitting the deploy into two steps. The -e flag must be used so CDK doesn’t deploy all stacks. Here is an example of that.

# first step will remove the usage of the export
cdk deploy --exclusively SecondStack
# second step can now remove the export
cdk deploy --all

@nakedible given NestedStack is now deprecated (as is all of the aws-cloudformation package)… Do you know what is the correct way to solve this problem now?

This seems to be the most basic feature of dependencies. 😕

@rix0rrr this is the issue I meant. My current workaround for this is to create a “dummy” resource and attach the dependencies to that dummy resource. Something like this:

import cloudformation = require("@aws-cdk/aws-cloudformation");
...
// Get all subnet ids
    const subnetIds = props.vpc.isolatedSubnets
      .concat(props.vpc.privateSubnets)
      .concat(props.vpc.publicSubnets)
      .map(subnet => {
        return subnet.subnetId;
      });

// Create dummy cloudformation resource with all dependencies attached
    const dummyWaitHandle = new cloudformation.CfnWaitConditionHandle(this, "DummyResource");
    dummyWaitHandle.cfnOptions.metadata = {
      dependencies: subnetIds
    };

I’m encountering this as well, but with trying to update dependent stacks. In one example of my use case, I’m trying to separate the creation of ECS tasks from services. Ideally, I’d like to be able to destroy a service, without destroy the corresponding task (and its history).

By placing tasks and services in separate stacks, and just passing the relevant ref/arn information between stacks, I can accomplish destroying a service without the destroying the task, but I can’t update the task stack, since I’m blocked by the “in use by services” error.

That’s just one example. Overall, it helps from a code organization and re-usability standpoint for complex builds to separate and consolidate the creation of stacks according to the resources built. But the “in-use” dependency creates the need to consolidate complex builds into a single long, complex stack, with components that can’t be reused, to ensure each component can be updated.

While I recognize that there is a solution for this, it still feels like a workaround rather than the library working as expected. In CICD pipelines where the deployments are more rigid, it’s awkward for developers to keep track of this and prepare PRs specifically to address this problem before merging the real PRs with their desired updates.

The current approach feels too aggressive in removing the OutputValue from StackA when StackB deletes the dependency. Just because the OutputValue is no longer needed by StackB, why must it be deleted immediately from StackA? I don’t see any harm in keeping the OutputValue on StackA until another deployment cycle. Maybe CDK doesn’t have an easy way to track that state?

To me, it makes more sense for CDK to be lazier and simply leave the OutputValue in StackA and perform the update to StackB. Then, whenever StackA is deployed next, if CDK determines that it’s safe to do so, remove the OutputValue from StackA. If developers do care about the OutputValue of StackA being deleted immediately, then there could be a flag that would trigger the current error and force the current workaround to be used (eg. --fail-on-cross-stack-dependency-change). Or, alternatively, a --lazy-delete-export-values flag if my desired behavior is opt-in.

Suggested workflow where developers intuitively work with StackB’s dependency on StackA:

Day 1

  1. StackA - creates and exports an S3 bucket ARN - Deploy StackA
  2. StackB - uses the S3 bucket ARN as an env variable to a lambda function - Deploy StackB

Day 2

  1. StackB - remove the lambda function in the code. Shows a cdk diff (removal of lambda function).
  2. StackA - Shows no diff and a deploy would say “No changes” .
  3. StackB - Deploy StackB. Nothing happens to StackA.
  4. StackA - Diff StackA - Shows a diff (removal of ExportValue S3 ARN).

Day 2 or 3 or 74

  1. Somebody pushes code and CICD detects that StackA has a diff (removal of ExportValue S3 ARN). CICD deploys StackA to remove the OutputValue. This might be surprising to developers because they see a cdk diff for StackA but don’t see any relevant code changes in StackA or StackB. The developers know that this is a quirk of cross-stack dependency management in CDK.

Not sure if others are more comfortable with the current solution, but I still spend a lot of time thinking about these cross-stack dependencies due to this issue and I don’t fully understand why the above flow isn’t the standard behavior.

My two cents: A tool to detect these conditions at build time (instead of deploy time) is possible, and would be a big help. Example workflow: (I made up some new commands):

# New command, that locks down interface at users's request:
# This file only contains Imports, Exports:
cdk shrinkwrap --all > my_production_interface.json

# user should checkin their interface:
git add my_production_interface.json && git commit -m "Added current deployment interface"

# Now, regular build will fail if current cdk output is not compatible with interface:
cdk build --interface my_production_interface.json
# FAIL