terraform-provider-aws: RDS InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave “+1” or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.20 provider.aws v2.46.0

Affected Resource(s)

aws_db_instance
aws_db_parameter_group

Terraform Configuration Files

terraform {
  required_providers {
    aws = "= 2.46.0"
  }
}
provider "aws" { region = "us-west-1" }

data "aws_vpc" "wailupes-main" {
  filter {
    name   = "tag:Name"
    values = ["wailupes-main"]
  }
}
data "aws_iam_role" "enhanced_monitoring" {
  name = "staging-enhanced-monitoring"
}

resource "aws_db_instance" "rds" {
  identifier                   = "test-rds"
  allocated_storage            = 100
  engine                       = "postgres"
  engine_version               = "11.1"
  instance_class               = "db.m4.large"
  name                         = "testdb"
  username                     = "testuser"
  password                     = "testpassword"
  db_subnet_group_name         = "wailupes-rds"
  parameter_group_name         = "postgres-11-tuned-staging"
  multi_az                     = false
  storage_type                 = "gp2"
  storage_encrypted            = false
  auto_minor_version_upgrade   = false
  apply_immediately            = true
  deletion_protection          = false
  kms_key_id                   = ""
  performance_insights_enabled = false
  backup_retention_period      = 1
  ca_cert_identifier           = "rds-ca-2019"
  monitoring_interval          = 30
  monitoring_role_arn          = data.aws_iam_role.enhanced_monitoring.arn
  skip_final_snapshot          = true

  timeouts {
    update = "120m"
  }
}

resource "aws_db_instance" "rds-read" {
  identifier                 = "test-rds-read-0"
  allocated_storage          = 100
  engine                     = "postgres"
  engine_version             = "11.1"
  instance_class             = "db.m4.large"
  username                   = "testuser"
  parameter_group_name       = "postgres-11-tuned-staging"
  storage_type               = "gp2"
  storage_encrypted          = false
  replicate_source_db        = aws_db_instance.rds.id
  auto_minor_version_upgrade = false
  apply_immediately          = true

  monitoring_interval          = 30
  monitoring_role_arn          = data.aws_iam_role.enhanced_monitoring.arn
  kms_key_id                   = ""
  performance_insights_enabled = false
  skip_final_snapshot          = true
  ca_cert_identifier           = "rds-ca-2019"
}

Debug Output

Shortened debug output here: https://gist.github.com/kbaldyga/825f0239776463a69969b847f35d53bd

Expected Behavior

When adding a read-replica to an existing RDS instance, with a custom db parameter group, enhanced monitoring and ca_cert_identifier, terraform will randomly fail with Instance cannot currently reboot due to an in-progress management operation. The read replica is eventually correctly created, but the resource is marked as tainted and terraform returns an error response code.

Actual Behavior

When adding a read-replica to an existing RDS instance, terraform aws provider performs multiple steps:

creates a read replica (I can see in the log file rds/CreateDBInstanceReadReplica), this than waits (rds/DescribeDBInstances) for the instance to be available,
next it calls ModifyDBInstance (see attached log file), this again calls rds/DescribeDBInstances multiple times and waits for the instance to be available,
once the instance is available, terraform calls rds/RebootDBInstance. But in the meantime AWS decides to apply changes to the instance and the call to rds/RebootDBInstance fails.

Because this all depends on time, it’s difficult to consistently reproduce the issue. But after spending some time with various configurations, I am pretty confident it’s the combination of all 3: enhanced monitoring, ca_cert_identifier, and custom parameter group in the resource "aws_db_instance" "rds-read" that’s causing the issue. As a workaround we decided to remove the ca_cert_identifier for now from our terraform configuration, since “rds-ca-2019” is the new default anyways.

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 90
Comments: 15 (4 by maintainers)

Commits related to this issue

Retry RDS reboot-in-progress errors Fixes https://github.com/terraform-providers/terraform-provider-aws/issues/11905 — committed to nijave/terraform-provider-aws by nijave 4 years ago
Retry RDS reboot-in-progress errors Fixes https://github.com/terraform-providers/terraform-provider-aws/issues/11905 — committed to Root-App/terraform-provider-aws by nijave 4 years ago
Retry RDS reboot-in-progress errors Fixes https://github.com/terraform-providers/terraform-provider-aws/issues/11905 — committed to Root-App/terraform-provider-aws by nijave 4 years ago

Most upvoted comments

We saw this error on version 3.64.1 just the other day.

danhooper on Nov 19, 2021

Seeing this as well in us-east-1 (the OP is from us-west-1). It looks like Terraform should probably just retry in the face of these errors

nijave on Apr 27, 2020

I think the ca_cert_identifier is the cause of this. In govcloud, this value needs to be set to “rds-ca-2017”. Each time the provider attempts to create a aws_rds_cluster_instance in govcloud, the apply fails with

Error: error rebooting DB Instance (xxxx-gov-dev-3): InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation.
	status code: 400, request id: 597cafa8-c9dd-4ee9-9678-9e3d6e11efd5

It is possible to get past this by untainting the resource and running the apply again

Reproduced on provider version 2.51.0

roscoecairney on Mar 2, 2020