terraform-provider-google: Cannot delete instance group because it's being used by a backend service

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request.
  • Please do not leave +1 or me too comments, they generate extra noise for issue followers and do not help prioritize the request.
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment.
  • If an issue is assigned to the modular-magician user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to hashibot, a community member has claimed the issue already.

Terraform Version

Terraform v0.12.24

  • provider.google v3.21.0
  • provider.google-beta v3.21.0

Affected Resource(s)

  • google_compute_region_backend_service
  • google_compute_instance_group

Terraform Configuration Files

locals {
  project         = "<project-id>"
  network         = "<vpc-name>"
  network_project = "<vpc-project>"
  zones           = ["europe-west1-b", "europe-west1-c", "europe-west1-d"]
  s1_count        = 3
}

provider "google" {
  project = local.project
  version = "~> 3.0"
}

data "google_compute_network" "network" {
  name    = local.network
  project = local.network_project
}

resource "google_compute_region_backend_service" "s1" {
  name = "s1"

  dynamic "backend" {
    for_each = google_compute_instance_group.s1
    content {
      group = backend.value.self_link
    }
  }
  health_checks = [
    google_compute_health_check.default.self_link,
  ]
}

resource "google_compute_health_check" "default" {
  name = "s1"
  tcp_health_check {
    port = "80"
  }
}

resource "google_compute_instance_group" "s1" {
  count   = local.s1_count
  name    = format("s1-%02d", count.index + 1)
  zone    = element(local.zones, count.index)
  network = data.google_compute_network.network.self_link
}

I’m not sure is this a general TF problem or a Google provider problem, but here it goes. Currently it’s not possible to lover the number of google_compute_instance_group that are used in a google_compute_region_backend_service. In the code above if we lower the number of google_compute_instance_group resources and try to apply the configuration, TF will first try to delete the not needed instance groups and then update the backend configuration, but that order doesn’t work because you cannot delete an instance group that is used by the backend service, the order should be the other way around.

So to sum it up, when I lower the number of the instance group resources TF does this:

  1. delete surplus google_compute_instance_group -> this fails
  2. update google_compute_region_backend_service

It should do this the other way around:

  1. update google_compute_region_backend_service
  2. delete surplus google_compute_instance_group -> this fails

Here is the output it generates:

google_compute_instance_group.s1[2]: Destroying... [id=projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03]

Error: Error deleting InstanceGroup: googleapi: Error 400: The instance_group resource 'projects/<project-id>/zones/europe-west1-d/instanceGroups/s1-03' is already being used by 'projects/<project-id>/regions/europe-west1/backendServices/s1', resourceInUseByAnotherResource

Expected Behavior

TF should first update the google_compute_region_backend_service, then delete the instance group.

Actual Behavior

TF tried to delete the instance group first, which resulted in an error.

Steps to Reproduce

  1. terraform apply
  2. Set s1_count = 2
  3. terraform apply

Important Factoids

It’s not a simple task to fix this. One “workaround” is to change the dynamic for_each to have a slice() function like this:

  dynamic "backend" {
    for_each = slice(google_compute_instance_group.s1, 0, 2)
    content {
      group = backend.value.self_link
    }
  }

So you first set the second number of slice() to the new number of the instanca groups run apply, then lower the s1_count to that same number and run apply again, but that’s just to complicated for a simple task like this.

b/308569276

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 51
  • Comments: 23 (4 by maintainers)

Commits related to this issue

Most upvoted comments

This has been driving me nuts for months. Using Cloud Run behind external GCLB. Backend services for the Serverless NEGs are in use by the URL map.

Once all this config/infra is in place, the service / backend service cannot be deleted even if removing the URL map in the same change. It’s becomes a two step of removing URL map, then removing service and backend service.

In an enterprise setting with ~10 environments each receiving different releases at different schedules, having repeat CI pipelines is not okay and is basically unmanageable.

Can confirm that this is the case with manual global load balancing setup on Google Provider as well. Definitely super annoying that we need to manually need to:

  1. Update our terraform config to remove a desired deployment region (e.g. `us-central1).
  2. Run the following command manually:
$ gcloud beta compute backend-services remove-backend --global revere-backend \
    --network-endpoint-group-region=<region> \
    --network-endpoint-group=revere-neg-<region>
  1. terraform apply to achieve desired state.

This means anytime we turn down on a region some administrator is going to have to do this instead of simply relying on CI/CD. What’s worse is that it makes proving certain security/compliance certifications harder as our CI/CD + pull request process is audited and logged; but random CLI commands from an administrator’s shell environment is harder to track (i.e. we need to involve GCP Audit Logging in the business justifications).

Looking forward to an elegant solution by the provider here.

@pdecat that should work, and requires implementing a new fine-grained resource google_compute_region_backend_service_backend.

Reopening the issue since a solution is possible, and this will be tracked similarly to other feature-requests.

lack of pretty essential features and bugs like this makes me very disappointed with all the terraform and GCP

@StephenWithPH ForceNew would have the same effect, but make every change (addition as well as removal) to the backend set destructive. Providing a new fine-grained resource is the cleaner option here.

I actually just ran into this issue a couple of days ago, and I was able to resolve it by appending a random string to the end of the group manager’s name and using the create_before_destroy lifecycle policy for the instance group manager resource. For whatever reason, doing so leads Terraform to modify the backend service before destroying the original instance group. Still not the prettiest hack in the world, but better than having to issue multiple applies.

This issue is actually quite problematic

I get these errors trying to destroy the whole module. It requires multiple targeted terraform destroys to complete


Error: Error when reading or editing HealthCheck: googleapi: Error 400: The health_check resource 'projects/test-proj/global/healthChecks/atlantis-healthcheck' is already being used by 'projects/test-proj/global/backendServices/atlantis-backend-service', resourceInUseByAnotherResource

Error: Error waiting for Deleting SecurityPolicy: The security_policy resource 'projects/test-proj/global/securityPolicies/atlantis-security-policy' is already being used by 'projects/test-proj/global/backendServices/atlantis-backend-service'

Error: Error deleting InstanceGroup: googleapi: Error 400: The instance_group resource 'projects/test-proj/zones/us-central1-a/instanceGroups/instance-group-all' is already being used by 'projects/test-proj/global/backendServices/atlantis-backend-service', resourceInUseByAnotherResource

Disappointing this exists for 2+ years and still no fix.

How come terraform doesn’t understand it can’t delete a managed instance group without first removing the load balancer (i.e. backend) depending on it? Seems a pretty simple idea, which for some reason isn’t implemented?

I actually just ran into this issue a couple of days ago, and I was able to resolve it by appending a random string to the end of the group manager’s name and using the create_before_destroy lifecycle policy for the instance group manager resource. For whatever reason, doing so leads Terraform to modify the backend service before destroying the original instance group. Still not the prettiest hack in the world, but better than having to issue multiple applies.

hi could you paste an example of what you did with the create_before_destroy ?

I can relate to this, GCP doesn’t update the URL map before destroying backend services. Very frustrating.