terraform-provider-aws: aws_security_group: timeout while waiting for state to become 'success'. Subsequent terraform runs fails on that resource
Short story: we know that AWS is throttling our API requests. Sometimes we timeout on creating a security group. The problem is however that subsequent terraform runs are failing because the security group was created but is not completely present in tfstate. Security group rules are not recorded in tfstate.
Terraform Version
terraform 0.11.2 aws provider ersion 1.7.1
Affected Resource(s)
aws_security_group
There might be a problem on how terraform handles resources that fails. perhaps on failure this resource should be tainted so subsequent runs succeeds.
Terraform Configuration Files
resource "aws_security_group" "base_sg" {
name = "base_project_sg_${var.sqsc_project_name}_${var.environment}"
description = "Basic Security Group for ${var.sqsc_project_name} ${var.environment}"
vpc_id = "${data.aws_vpc.main.id}"
tags {
Name = "base_project_sg_${var.sqsc_project_name}_${var.environment}"
Environment = "${var.environment}"
Project = "${var.sqsc_project_name}"
ProjectUuid = "${var.sqsc_project_uuid}"
}
}
resource "aws_security_group_rule" "base_sg_ingress_ssh" {
security_group_id = "${aws_security_group.base_sg.id}"
type = "ingress"
from_port = 22
to_port = 22
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
resource "aws_security_group_rule" "base_sg_ingress_http" {
security_group_id = "${aws_security_group.base_sg.id}"
type = "ingress"
from_port = 80
to_port = 80
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
// ...
resource "aws_security_group_rule" "base_sg_egress" {
security_group_id = "${aws_security_group.base_sg.id}"
type = "egress"
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
}
Debug Output
This is a transident error with terraform running in an automated environment. We do not have debug output for this run at the moment.
However, we run terraform multiples times, and the first time we run it, we have the following error
1 error(s) occurred:
* aws_security_group.base_sg: 1 error(s) occurred:
* aws_security_group.base_sg: timeout while waiting for state to become 'success' (timeout: 5m0s)
Then all subsequent terraform apply executions fails with:
1 error(s) occurred:
* aws_security_group_rule.base_sg_egress: 1 error(s) occurred:
* aws_security_group_rule.base_sg_egress: [WARN] A duplicate Security Group rule was found on (sg-048b7c7e). This may be
a side effect of a now-fixed Terraform issue causing two security groups with
identical attributes but different source_security_group_ids to overwrite each
other in the state. See https://github.com/hashicorp/terraform/pull/2376 for more
information and instructions for recovery. Error message: the specified rule "peer: 0.0.0.0/0, ALL, ALLOW" already exists
Full logs here: https://gist.github.com/mildred/9245356ec1ef599f91eb15f2bd9a6666
Expected Behavior
Terraform should taint the security group if it fails on it due to a timeout so next run will create it anew. Or perhaps just taint the security_group_rules within it. Or it should register the security group rules properly in the tfstate.
Actual Behavior
Terraform timeouts then fails to create the resource because a rule it thought was not present is created.
Steps to Reproduce
Run terraform enough to be throttled by AWS
Important Factoids
- Our API requests are being throttled by AWS (we asked the support for that and they confirmed it)
- We increased
max_retriedsetting for the aws provider to 40
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 11
- Comments: 22 (17 by maintainers)
Commits related to this issue
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer th... — committed to squarescale/terraform-provider-aws by mildred 6 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer th... — committed to squarescale/terraform-provider-aws by mildred 6 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer th... — committed to squarescale/terraform-provider-aws by mildred 6 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer th... — committed to squarescale/terraform-provider-aws by mildred 6 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer th... — committed to squarescale/terraform-provider-aws by mildred 6 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer t... — committed to squarescale/terraform-provider-aws by obourdon 5 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer t... — committed to squarescale/terraform-provider-aws by obourdon 5 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer t... — committed to squarescale/terraform-provider-aws by obourdon 5 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer t... — committed to squarescale/terraform-provider-aws by obourdon 5 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer t... — committed to squarescale/terraform-provider-aws by obourdon 5 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer t... — committed to squarescale/terraform-provider-aws by obourdon 5 years ago
- Allow configurable timeout when reading security group rule When being throttled on AWS requests, read requests are the first ones to be throttled, and reading security group rules can take longer t... — committed to squarescale/terraform-provider-aws by obourdon 5 years ago
Perhaps an easy way to mitigate this problem could be to allow configurable timeouts on aws_security_group resources. As it is now, it is not configurable:
@mildred terraform does implement an exponential backoff algorithm, for example in
waitForStatehere: https://github.com/hashicorp/terraform/blob/master/helper/resource/state.go#L51However, I think the algorithm is not used in every API call throughout.