terraform-provider-aws: RDS InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation
Community Note
- Please vote on this issue by adding a ๐ reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave โ+1โ or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
Terraform Version
Terraform v0.12.20 provider.aws v2.46.0
Affected Resource(s)
- aws_db_instance
- aws_db_parameter_group
Terraform Configuration Files
terraform {
required_providers {
aws = "= 2.46.0"
}
}
provider "aws" { region = "us-west-1" }
data "aws_vpc" "wailupes-main" {
filter {
name = "tag:Name"
values = ["wailupes-main"]
}
}
data "aws_iam_role" "enhanced_monitoring" {
name = "staging-enhanced-monitoring"
}
resource "aws_db_instance" "rds" {
identifier = "test-rds"
allocated_storage = 100
engine = "postgres"
engine_version = "11.1"
instance_class = "db.m4.large"
name = "testdb"
username = "testuser"
password = "testpassword"
db_subnet_group_name = "wailupes-rds"
parameter_group_name = "postgres-11-tuned-staging"
multi_az = false
storage_type = "gp2"
storage_encrypted = false
auto_minor_version_upgrade = false
apply_immediately = true
deletion_protection = false
kms_key_id = ""
performance_insights_enabled = false
backup_retention_period = 1
ca_cert_identifier = "rds-ca-2019"
monitoring_interval = 30
monitoring_role_arn = data.aws_iam_role.enhanced_monitoring.arn
skip_final_snapshot = true
timeouts {
update = "120m"
}
}
resource "aws_db_instance" "rds-read" {
identifier = "test-rds-read-0"
allocated_storage = 100
engine = "postgres"
engine_version = "11.1"
instance_class = "db.m4.large"
username = "testuser"
parameter_group_name = "postgres-11-tuned-staging"
storage_type = "gp2"
storage_encrypted = false
replicate_source_db = aws_db_instance.rds.id
auto_minor_version_upgrade = false
apply_immediately = true
monitoring_interval = 30
monitoring_role_arn = data.aws_iam_role.enhanced_monitoring.arn
kms_key_id = ""
performance_insights_enabled = false
skip_final_snapshot = true
ca_cert_identifier = "rds-ca-2019"
}
Debug Output
Shortened debug output here: https://gist.github.com/kbaldyga/825f0239776463a69969b847f35d53bd
Expected Behavior
When adding a read-replica to an existing RDS instance, with a custom db parameter group, enhanced monitoring and ca_cert_identifier, terraform will randomly fail with Instance cannot currently reboot due to an in-progress management operation. The read replica is eventually correctly created, but the resource is marked as tainted and terraform returns an error response code.
Actual Behavior
When adding a read-replica to an existing RDS instance, terraform aws provider performs multiple steps:
- creates a read replica (I can see in the log file
rds/CreateDBInstanceReadReplica), this than waits (rds/DescribeDBInstances) for the instance to be available, - next it calls
ModifyDBInstance(see attached log file), this again callsrds/DescribeDBInstancesmultiple times and waits for the instance to be available, - once the instance is available, terraform calls
rds/RebootDBInstance. But in the meantime AWS decides to apply changes to the instance and the call tords/RebootDBInstancefails.
Because this all depends on time, itโs difficult to consistently reproduce the issue. But after spending some time with various configurations, I am pretty confident itโs the combination of all 3: enhanced monitoring, ca_cert_identifier, and custom parameter group in the resource "aws_db_instance" "rds-read" thatโs causing the issue.
As a workaround we decided to remove the ca_cert_identifier for now from our terraform configuration, since โrds-ca-2019โ is the new default anyways.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 90
- Comments: 15 (4 by maintainers)
Commits related to this issue
- Retry RDS reboot-in-progress errors Fixes https://github.com/terraform-providers/terraform-provider-aws/issues/11905 — committed to nijave/terraform-provider-aws by nijave 4 years ago
- Retry RDS reboot-in-progress errors Fixes https://github.com/terraform-providers/terraform-provider-aws/issues/11905 — committed to Root-App/terraform-provider-aws by nijave 4 years ago
- Retry RDS reboot-in-progress errors Fixes https://github.com/terraform-providers/terraform-provider-aws/issues/11905 — committed to Root-App/terraform-provider-aws by nijave 4 years ago
We saw this error on version
3.64.1just the other day.Seeing this as well in us-east-1 (the OP is from us-west-1). It looks like Terraform should probably just retry in the face of these errors
I think the
ca_cert_identifieris the cause of this. In govcloud, this value needs to be set to โrds-ca-2017โ. Each time the provider attempts to create a aws_rds_cluster_instance in govcloud, the apply fails withIt is possible to get past this by untainting the resource and running the apply again
Reproduced on provider version 2.51.0