terraform-provider-aws: RDS InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation

Community Note

  • Please vote on this issue by adding a ๐Ÿ‘ reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave โ€œ+1โ€ or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

Terraform v0.12.20 provider.aws v2.46.0

Affected Resource(s)

  • aws_db_instance
  • aws_db_parameter_group

Terraform Configuration Files

terraform {
  required_providers {
    aws = "= 2.46.0"
  }
}
provider "aws" { region = "us-west-1" }

data "aws_vpc" "wailupes-main" {
  filter {
    name   = "tag:Name"
    values = ["wailupes-main"]
  }
}
data "aws_iam_role" "enhanced_monitoring" {
  name = "staging-enhanced-monitoring"
}

resource "aws_db_instance" "rds" {
  identifier                   = "test-rds"
  allocated_storage            = 100
  engine                       = "postgres"
  engine_version               = "11.1"
  instance_class               = "db.m4.large"
  name                         = "testdb"
  username                     = "testuser"
  password                     = "testpassword"
  db_subnet_group_name         = "wailupes-rds"
  parameter_group_name         = "postgres-11-tuned-staging"
  multi_az                     = false
  storage_type                 = "gp2"
  storage_encrypted            = false
  auto_minor_version_upgrade   = false
  apply_immediately            = true
  deletion_protection          = false
  kms_key_id                   = ""
  performance_insights_enabled = false
  backup_retention_period      = 1
  ca_cert_identifier           = "rds-ca-2019"
  monitoring_interval          = 30
  monitoring_role_arn          = data.aws_iam_role.enhanced_monitoring.arn
  skip_final_snapshot          = true

  timeouts {
    update = "120m"
  }
}

resource "aws_db_instance" "rds-read" {
  identifier                 = "test-rds-read-0"
  allocated_storage          = 100
  engine                     = "postgres"
  engine_version             = "11.1"
  instance_class             = "db.m4.large"
  username                   = "testuser"
  parameter_group_name       = "postgres-11-tuned-staging"
  storage_type               = "gp2"
  storage_encrypted          = false
  replicate_source_db        = aws_db_instance.rds.id
  auto_minor_version_upgrade = false
  apply_immediately          = true

  monitoring_interval          = 30
  monitoring_role_arn          = data.aws_iam_role.enhanced_monitoring.arn
  kms_key_id                   = ""
  performance_insights_enabled = false
  skip_final_snapshot          = true
  ca_cert_identifier           = "rds-ca-2019"
}

Debug Output

Shortened debug output here: https://gist.github.com/kbaldyga/825f0239776463a69969b847f35d53bd

Expected Behavior

When adding a read-replica to an existing RDS instance, with a custom db parameter group, enhanced monitoring and ca_cert_identifier, terraform will randomly fail with Instance cannot currently reboot due to an in-progress management operation. The read replica is eventually correctly created, but the resource is marked as tainted and terraform returns an error response code.

Actual Behavior

When adding a read-replica to an existing RDS instance, terraform aws provider performs multiple steps:

  1. creates a read replica (I can see in the log file rds/CreateDBInstanceReadReplica), this than waits (rds/DescribeDBInstances) for the instance to be available,
  2. next it calls ModifyDBInstance (see attached log file), this again calls rds/DescribeDBInstances multiple times and waits for the instance to be available,
  3. once the instance is available, terraform calls rds/RebootDBInstance. But in the meantime AWS decides to apply changes to the instance and the call to rds/RebootDBInstance fails.

Because this all depends on time, itโ€™s difficult to consistently reproduce the issue. But after spending some time with various configurations, I am pretty confident itโ€™s the combination of all 3: enhanced monitoring, ca_cert_identifier, and custom parameter group in the resource "aws_db_instance" "rds-read" thatโ€™s causing the issue. As a workaround we decided to remove the ca_cert_identifier for now from our terraform configuration, since โ€œrds-ca-2019โ€ is the new default anyways.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 90
  • Comments: 15 (4 by maintainers)

Commits related to this issue

Most upvoted comments

We saw this error on version 3.64.1 just the other day.

Seeing this as well in us-east-1 (the OP is from us-west-1). It looks like Terraform should probably just retry in the face of these errors

I think the ca_cert_identifier is the cause of this. In govcloud, this value needs to be set to โ€œrds-ca-2017โ€. Each time the provider attempts to create a aws_rds_cluster_instance in govcloud, the apply fails with

Error: error rebooting DB Instance (xxxx-gov-dev-3): InvalidDBInstanceState: Instance cannot currently reboot due to an in-progress management operation.
	status code: 400, request id: 597cafa8-c9dd-4ee9-9678-9e3d6e11efd5

It is possible to get past this by untainting the resource and running the apply again

Reproduced on provider version 2.51.0