terraform-provider-aws: aws_ecs_cluster with capacity_providers cannot be destroyed
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave “+1” or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Relates #5278 Relates #11351 Relates #11531 Relates #22672 Relates #22754
Maintainer Note
- Fixing this problem involves several pieces:
- Creating a new resource that avoids the spurious capacity providers dependency chain (and allows capacity providers association with existing clusters), which is #22672
- Deprecating the
capacity_providersanddefault_capacity_provider_strategyarguments ofaws_ecs_cluster(#22754) - Removing the
capacity_providersanddefault_capacity_provider_strategyarguments fromaws_ecs_cluster, which is a breaking change
- While the complete solution includes a breaking change, that doesn’t prevent us from moving forward with i. and ii. (v4.0) above and then keeping iii. in mind for v5.0.
Terraform Version
Terraform v0.12.18
- provider.aws v2.43.0
Affected Resource(s)
- aws_ecs_cluster
- aws_ecs_capacity_provider
- aws_autoscaling_group
Terraform Configuration Files
resource "aws_ecs_cluster" "indestructable" {
name = "show_tf_cp_flaw"
capacity_providers = [aws_ecs_capacity_provider.cp.name]
default_capacity_provider_strategy {
capacity_provider = aws_ecs_capacity_provider.cp.name
}
}
resource "aws_ecs_capacity_provider" "cp" {
name = "show_tf_cp_flaw"
auto_scaling_group_provider {
auto_scaling_group_arn = aws_autoscaling_group.asg.arn
managed_scaling {
status = "ENABLED"
target_capacity = 80
}
}
}
resource "aws_autoscaling_group" "asg" {
min_size = 2
....
}
Debug Output
Panic Output
Expected Behavior
terraform destroy should be able to destroy an aws_ecs_cluster which has capacity_providers set.
Actual Behavior
Error: Error deleting ECS cluster: ClusterContainsContainerInstancesException: The Cluster cannot be deleted while Container Instances are active or draining.
The problem is that this new capacity_provider property on the aws_ecs_cluster introduces a new dependency:
aws_ecs_cluster
depends on aws_ecs_capacity_provider
depends on aws_autoscaling_group
This causes terraform to destroy the ECS cluster before the autoscaling group, which is the wrong way around: the autoscaling group must be destroyed first because the cluster must contain zero instances before it can be destroyed.
A possible solution may be to introduce a new resource type representing the attachment of a capacity provider to a cluster (inspired by aws_iam_role_policy_attachment which is the attachment of an IAM policy to a role).
This would allow the following dependency graph which would work beautifully:
aws_ecs_capacity_provider_cluster_attachment
depends on aws_ecs_cluster and aws_ecs_capacity_provider;
aws_ecs_capacity_provider
depends on aws_autoscaling_group
depends on aws_launch_template
depends on aws_ecs_cluster (e.g. via the user_data property which needs to set the ECS_CLUSTER environment variable to the name of the cluster).
Steps to Reproduce
terraform applyterraform destroy
Important Factoids
References
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 106
- Comments: 18 (8 by maintainers)
Commits related to this issue
- r/aws_ecs_cluster_capacity_providers: add test for #11409 — committed to roberth-k/terraform-provider-aws by roberth-k 2 years ago
- r/aws_ecs_cluster_capacity_providers: add test for #11409 — committed to roberth-k/terraform-provider-aws by roberth-k 2 years ago
Meanwhile here is a nasty workaround using a destroy provisioner, that worked for me to allow the
aws_ecs_clusterto be destroyed:Hi all 👋 Just letting you know that this is issue is featured on this quarters roadmap. If a PR exists to close the issue a maintainer will review and either make changes directly, or work with the original author to get the contribution merged. If you have written a PR to resolve the issue please ensure the “Allow edits from maintainers” box is checked. Thanks for your patience and we are looking forward to getting this merged soon!
Thank you for the input on this issue! We are carefully considering work on this in the near future. (No guarantees on an exact date.) In order to facilitate the implementation, I’ve outlined some thoughts below.
After looking through this, I agree with the suggested way forward:
aws_ecs_capacity_provider_cluster_attachment)capacity_providersanddefault_capacity_provider_strategyarguments would be a breaking change that would need to wait until a major release. Please comment below on whether ideally these would stay or go.Please provide any feedback, yay or nay.
Ah, turns out this is precisely the issue described in https://github.com/hashicorp/terraform-provider-aws/issues/11531. In short, the design of capacity providers is broken in Terraform right now, as it creates an invalid dependency chain:
aws_ecs_cluster->aws_ecs_capacity_provider->aws_autoscaling_group. This chain isn’t valid, because ondestroy, Terraform will try to deleteaws_ecs_clusterfirst, but it can’t, because theaws_autoscaling_grouphasn’t been deleted. So we need anaws_ecs_capacity_provider_attachmentto use capacity providers without such a dependency chain.@edmundcraske-bjss Yes, you are absolutely correct! Thank you.
At this point, the best way forward looks like #22672. That will address the op recommended solution of an attachment resource (though named
aws_ecs_cluster_capacity_providersinstead). That will solve the main problem here of using capacity providers but not being able to destroy the cluster. It will also solve the problem of not being able to associate capacity providers with existing clusters.Is it possible that your test did not hit the issue because the EC2 instances were not actually registering with the ECS cluster?
Same as #4852 . Someone consolidate all these, this is really noisy.
Having this issue too. On
destroy, I get the error:This started around Terraform 0.12, and we added retries to work around it. We’re now upgrading to 0.15, and the retries no longer seem to help, so this is a blocker.
Any news? I still waiting for this issue to be fixed
Any updates here? This is terribly annoying to deal with. (The workaround does not work in my particular case)