aws-cdk: Support CloudFormation rollback triggers

CloudFormation supports specifying 0-5 rollback triggers as CloudWatch Metric Alarms, which when triggered will automatically cause the stack update to be cancelled. Also a monitoring time of 0-180 minutes can be specified, which means a pause time CloudFormation will wait for any of the alarms to be triggered, or a rollback to be manually triggered, before cleaning up any resources.

There should be a way to use these with AWS CDK.

Use Case

Rollback triggers have obvious uses to make stack updates more reliable.

Proposed Solution

Similar to --notification-arns currently in deploy command, add --rollback-trigger-alarm-arns option to be able to list 1-5 CloudWatch Alarms that automatically trigger a rollback. Also add --monitoring-time-minutes option which can be used to add 0-180 minutes of pause time after a stack update before the cleanup phase starts. Both options can be specified independently, as they are useful on their own.

Other

https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/using-cfn-rollback-triggers.html https://docs.aws.amazon.com/AWSCloudFormation/latest/APIReference/API_RollbackConfiguration.html

  • đź‘‹ I may be able to implement this feature request
  • ⚠️ This feature might incur a breaking change

This is a 🚀 Feature Request

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 88
  • Comments: 28 (18 by maintainers)

Most upvoted comments

+1

Amazon is clear about the importance of Metrics monitoring and auto-rollback. It is a problem that the primary and recommended tool (CDK) doesn’t let customers integrate with this functionality which cloudformation supports.

I don’t think it makes sense to block all the customers using CDK and Cloudformation because of lack of support in Code Pipelines.

I’d like to echo this sentiment. Two ways in which I think CDK could support rollbacks without the support of CodePipeline is:

  1. By having rollbackAlarmArn and monitoringPeriod fields on a cdk.Stack that gets forwarded to CloudFormation create-change-set and execute-change-set operations during cdk deploy.
  2. Supporting --rollback-alarm-arn and --monitoring-period command-line arguments when running cdk deploy.

At the time when CodePipeline finally supports this integration, these features could be seamlessly integrated with it too.

This unblocks users today who are not using CodePipeline because of this limitation.

+1

Amazon is clear about the importance of Metrics monitoring and auto-rollback. It is a problem that the primary and recommended tool (CDK) doesn’t let customers integrate with this functionality which cloudformation supports.

I don’t think it makes sense to block all the customers using CDK and Cloudformation because of lack of support in Code Pipelines.

I’m thinking something like this:

interface RollbackConfigurationOptions {
    monitoringPeriod?: Duration;

    rollbackTriggers?: RollbackTrigger[];
}

interface StackProps extends RollbackConfigurationOptions {
    // ...
}

class Stack {
    public configureRollback(options: RollbackConfigurationOptions) {
    }
}

// Usage

stack.configureRollback({ ... });
// -or-

new Stack(..., {
    monitoringPeriod: Duration.minutes(60),

    rollbackTriggers: [
        RollbackTrigger.fromAlarmArn('...', {
            monitorDuring: [StackLifecycle.UPDATE]
        })
    ]
});

RollbackTrigger.fromAlarmArn() needs to check that the ARN it gets is fully resolved (we need that to have the CLI pass it into ExecuteChangeSet properly). We will be adding future extensions to this that may support alarms created inside the Stack itself and have the CLI do some limited resolution and/or token substitution.

Needs some judicious use of defaults to make it agreeable to use, and we need to decide whether to honor the monitoringPeriod if none of the triggers apply during our current phase (notably: CREATE will be a typical one). I vote no, and if we deem it necessary we add a boolean to allow it.

IMPLEMENTATION NOTES

  • Needs propagation of that information through the cx-api package to the CLI (on the properties of the StackArtifact).
  • CLI evaluates the rollback triggers that apply based on the current stack lifecycle and passes them into the CFN API.

i like @rix0rrr’s idea. It also would be interesting if you could configure it from a CloudFormationCreateUpdateStackAction

For what it’s worth, my team has gotten around this issue and it hasn’t been a problem for over two years now. Like @nakedible-p, we have a CodePipeline that deploys to multiple stacks across multiple accounts and regions; and the rollback triggers are defined in the stack themselves (and therefore must be added after the fact).

For us, the key is that the rollback configuration doesn’t actually change because we define a single composite CloudWatch alarm for rollback. This lets us “set it and forget it”. We perform a one-time action of updating the stack with its rollback configuration. The equivalent CLI command would be:

aws cloudformation update-stack --stack-name $stack --use-previous-template --rollback-configuration $config

Because of the --use-previous-template option, this command can be executed at any point in time after that initial stack launch without any acrobatics to get the template itself. We already have a script for bootstrapping all those accounts and regions (as well as setting up the accounts themselves), so it’s not much to add this trivial one as well. If we need to change the alarms associated with the rollback, we just modify the composite alarm instead via CDK. No need to re-run the command, ever. If you wanted to, you could probably set up AWS Config (https://aws.amazon.com/config/) rule to warn you if you haven’t performed this one-time step.

It may not be what we set out to do, but it accomplishes a task that, like others, are outside the scope of a stack update, even though the CloudFormation API confusingly hints that it is. If this ever gets supported in some way by CodePipeline, there’s not much effort misspent in no longer running the one-liner. There’s no infrastructure to set up (separate Lambda function), or mucking around with the pipeline structure (replacing the CREATE_CHANGE_SET). After all, it’s just a one-time change.

YMMV.

Hi all! Back with this stuff again – there is still no light at the end of the tunnel.

To recap my current status:

  • I’m only interested in doing this in conjunction with CDK Pipelines, so the command-line solution no longer cuts it for me
  • CodePipeline actions for CREATE_CHANGE_SET and EXECUTE_CHANGE_SET (which are used by CDK) do not support rollback triggers at all (so CDK can’t fix this directly, CodePipeline team can)
  • If rollback triggers are set on a stack, they are sticky so will stay in effect for future stack updates
  • In my case, all the alarms are part of the same stack, so do not exist before the stack does
  • The pipeline has many stacks for many regions and many accounts, with obviously differing alarms, so setting the rollback triggers need to be specifically per stack in the region/account the stack is in

So, the best solution for me would be that I can somehow specify rollback triggers from inside the stack, and they would be in effect for the next stack update.

I have figured out three possible ways of doing this:

  • Custom resource inside the stack, which triggers a step function that waits for the stack update to finish, and then does UpdateStack to put the rollback triggers in
  • Export rollback triggers as outputs from the stack and add a new action in the CDK Pipelines part after EXECUTE_CHANGE_SET that updates the stack with the rollback triggers
  • Replace CREATE_CHANGE_SET action in the CodePipeline with a Lambda function that looks at the stack outputs (before update) and sets the rollback triggers when calling CreateChangeSet directly.

None seem great options, but all will probably work for my use case.

All in all, I must say that “consumers” shouldn’t have to come up with ideas to build workarounds for an obvious missing feature that’s been there from 2019 and still is there in 2023. image

This is what I have done to get around limitations with setting termination protection, stack policy and rollback alarms in CDK. One caveat is this requires creating a second stack that depends on the first stack using the addDependency function.

In your second stack, you’ll need to copy the code for and instantiate a StackConstruct that I’ve made which triggers a lambda function using a custom CloudFormation resource.