terraform-provider-google: google_compute_backend_service failing to apply multiple backends
Community Note
- Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
- Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
- If you are interested in working on this issue or have submitted a pull request, please leave a comment
- If an issue is assigned to the “modular-magician” user, it is either in the process of being autogenerated, or is planned to be autogenerated soon. If an issue is assigned to a user, that user is claiming responsibility for the issue. If an issue is assigned to “hashibot”, a community member has claimed the issue already.
Description of Problem
I’m experiencing issues when trying to build a google_compute_backend_service with multiple backends (instance groups) in order to target all the nodes of my GKE cluster.
I have cluster module & a cluster-lb module which I execute from an environment terraform configuration. I am outputting the instance groups at the end of the cluster module based on a data resource to ensure I get the urls to all cluster nodes eg…
output "K8S_INSTANCE_GROUP_URLS" {
value = data.google_container_cluster.information.instance_group_urls
description = "URLs to the instance groups for all nodes"
}
For simplicity sake I am taking a variable in the cluster-lb module which is that list.
variable "backend_group_list" {
description = "Map backend indices to list of backend maps."
type = list
default = []
}
In my module code I am trying to configure the backend subblock as described here, which has a specific format of (i think):
backend = [
{ group = <url> },
{ group = <url> }
]
(seems to be what this imples)
or backend block is specified twice?
backend { group = <url> }
backend { group = <url> }
The topic of documentation is covered in #3498 and I initially added some error logs in this comment.
Terraform Version
Terraform v0.12.2
+ provider.google v2.9.1
+ provider.null v2.1.2
+ provider.random v2.1.2
+ provider.template v2.1.2
Affected Resource(s)
- google_compute_backend_service
Terraform Configuration Files
cluster-lb module backends
variable "backend_group_list" {
description = "Map backend indices to list of backend maps."
type = list
default = []
}
variable "backend_public" {
description = "Parameters to the public backend"
type = object({
enabled = bool
health_path = string
port_name = string
port_number = number
timeout_seconds = number
iap_enabled = bool
})
default = {
enabled = true
health_path = "/"
port_name = "http"
port_number = 30100
timeout_seconds = 30
iap_enabled = false
}
}
variable "backend_private" {
description = "Parameters to the private backend"
type = object({
enabled = bool
health_path = string
port_name = string
port_number = number
timeout_seconds = number
iap_enabled = bool
})
default = {
enabled = true
health_path = "/"
port_name = "http"
port_number = 30100
timeout_seconds = 30
iap_enabled = true
}
}
variable "backend_monitor" {
description = "Parameters to the monitoring backend"
type = object({
enabled = bool
health_path = string
port_name = string
port_number = number
timeout_seconds = number
iap_enabled = bool
})
default = {
enabled = true
health_path = "/"
port_name = "monitor"
port_number = 30101
timeout_seconds = 30
iap_enabled = true
}
}
resource "google_compute_backend_service" "public" {
project = var.project
name = "${var.name}-backend-public"
port_name = var.backend_public["port_name"]
protocol = "HTTP"
timeout_sec = var.backend_public["timeout_seconds"]
dynamic "backend" {
for_each = [ for b in var.backend_group_list : b ]
content {
group = backend.value
}
}
health_checks = list(google_compute_health_check.public.self_link)
}
resource "google_compute_backend_service" "private" {
project = var.project
name = "${var.name}-backend-private"
port_name = var.backend_private["port_name"]
protocol = "HTTP"
timeout_sec = var.backend_private["timeout_seconds"]
dynamic "backend" {
for_each = var.backend_group_list
content {
group = backend.value
// adding null values otherwise reapplication fails
balancing_mode = null
capacity_scaler = null
description = null
max_connections = null
max_connections_per_instance = null
max_rate = null
max_rate_per_instance = null
max_utilization = null
}
}
health_checks = list(google_compute_health_check.private.self_link)
iap {
oauth2_client_id = var.iap_oauth_id
oauth2_client_secret = var.iap_oauth_secret
}
}
resource "google_compute_backend_service" "monitor" {
project = var.project
name = "${var.name}-backend-monitor"
port_name = var.backend_monitor["port_name"]
protocol = "HTTP"
timeout_sec = var.backend_monitor["timeout_seconds"]
dynamic "backend" {
for_each = var.backend_group_list
content {
group = backend.value
}
}
health_checks = list(google_compute_health_check.monitor.self_link)
iap {
oauth2_client_id = var.iap_oauth_id
oauth2_client_secret = var.iap_oauth_secret
}
}
Debug Output
I’ve posted the encrypted version (sing hashicorp key from keybase) in this gist: https://gist.github.com/hawksight/bde83268020c8701fc9ac35c1b6d3fb8
Used the following to encrypt:
keybase pgp encrypt -i ~/Logs/1561630982-terraform.log -o ~/Logs/1561630982-terraform.log.crypt hashicorp
Wasn’t confident there wouldn’t be any sensitive details in the debug log, hence encryption. Let me know if I need to share another way.
Panic Output
None
Expected Behavior
I have three backends which I am manually specifying with different names. They are all backends to the same set of GKE nodes. Our clusters use multi-zone node pools and usually have two node pools. In GKE, this means you have an instance group for each zone for each node pool. In the example I am showing here, I have setup with two node pools in a single zone, so two instance groups equating to two backends to specify.
In the plan I expect to see two backend blocks as I am using the dynamic` provisioner from 0.12 to generate a block for each group URL / self-link passed in.
In the application I expect the backend to be created and have both instance groups as its target, not the fail with the error provided.
Actual Behavior
The plan worked although it only specifies one backend in the output. It only knows the groups after application, which I find unhelpful. Even when the cluster is prebuilt the plan still doesn’t see that I have more than one instance group to add. This is probably something to with the way terraform plans things, but unsure on specifics.
Here’s an example plan output:
# module.cluster-lb.google_compute_backend_service.monitor will be created
+ resource "google_compute_backend_service" "monitor" {
+ connection_draining_timeout_sec = 300
+ creation_timestamp = (known after apply)
+ fingerprint = (known after apply)
+ health_checks = (known after apply)
+ id = (known after apply)
+ load_balancing_scheme = "EXTERNAL"
+ name = "vpc-du-lb-backend-monitor"
+ port_name = "http"
+ project = "MASKED"
+ protocol = "HTTP"
+ self_link = (known after apply)
+ session_affinity = (known after apply)
+ timeout_sec = 30
+ backend {
+ balancing_mode = "UTILIZATION"
+ capacity_scaler = 1
+ group = (known after apply)
+ max_utilization = 0.8
}
+ cdn_policy {
+ signed_url_cache_max_age_sec = (known after apply)
+ cache_key_policy {
+ include_host = (known after apply)
+ include_protocol = (known after apply)
+ include_query_string = (known after apply)
+ query_string_blacklist = (known after apply)
+ query_string_whitelist = (known after apply)
}
}
+ iap {
+ oauth2_client_id = "MASKED"
+ oauth2_client_secret = (sensitive value)
+ oauth2_client_secret_sha256 = (sensitive value)
}
}
# module.cluster-lb.google_compute_backend_service.private will be created
+ resource "google_compute_backend_service" "private" {
+ connection_draining_timeout_sec = 300
+ creation_timestamp = (known after apply)
+ fingerprint = (known after apply)
+ health_checks = (known after apply)
+ id = (known after apply)
+ load_balancing_scheme = "EXTERNAL"
+ name = "vpc-du-lb-backend-private"
+ port_name = "http"
+ project = "MASKED"
+ protocol = "HTTP"
+ self_link = (known after apply)
+ session_affinity = (known after apply)
+ timeout_sec = 30
+ backend {
+ balancing_mode = (known after apply)
+ capacity_scaler = (known after apply)
+ description = (known after apply)
+ group = (known after apply)
+ max_connections = (known after apply)
+ max_connections_per_instance = (known after apply)
+ max_rate = (known after apply)
+ max_rate_per_instance = (known after apply)
+ max_utilization = (known after apply)
}
+ cdn_policy {
+ signed_url_cache_max_age_sec = (known after apply)
+ cache_key_policy {
+ include_host = (known after apply)
+ include_protocol = (known after apply)
+ include_query_string = (known after apply)
+ query_string_blacklist = (known after apply)
+ query_string_whitelist = (known after apply)
}
}
+ iap {
+ oauth2_client_id = "MASKED"
+ oauth2_client_secret = (sensitive value)
+ oauth2_client_secret_sha256 = (sensitive value)
}
}
# module.cluster-lb.google_compute_backend_service.public will be created
+ resource "google_compute_backend_service" "public" {
+ connection_draining_timeout_sec = 300
+ creation_timestamp = (known after apply)
+ fingerprint = (known after apply)
+ health_checks = (known after apply)
+ id = (known after apply)
+ load_balancing_scheme = "EXTERNAL"
+ name = "vpc-du-lb-backend-public"
+ port_name = "http"
+ project = "MASKED"
+ protocol = "HTTP"
+ self_link = (known after apply)
+ session_affinity = (known after apply)
+ timeout_sec = 30
+ backend {
+ balancing_mode = "UTILIZATION"
+ capacity_scaler = 1
+ group = (known after apply)
+ max_utilization = 0.8
}
+ cdn_policy {
+ signed_url_cache_max_age_sec = (known after apply)
+ cache_key_policy {
+ include_host = (known after apply)
+ include_protocol = (known after apply)
+ include_query_string = (known after apply)
+ query_string_blacklist = (known after apply)
+ query_string_whitelist = (known after apply)
}
}
}
I get the following errors when trying to apply:
Error: Provider produced inconsistent final plan
When expanding the plan for
module.cluster-lb.google_compute_backend_service.public to include new values
learned so far during apply, provider "google" produced an invalid new value
for .backend: block set length changed from 1 to 2.
This is a bug in the provider, which should be reported in the provider's own
issue tracker.
Error: Provider produced inconsistent final plan
When expanding the plan for
module.cluster-lb.google_compute_backend_service.monitor to include new values
learned so far during apply, provider "google" produced an invalid new value
for .backend: block set length changed from 1 to 2.
This is a bug in the provider, which should be reported in the provider's own
issue tracker.
Error: Provider produced inconsistent final plan
When expanding the plan for
module.cluster-lb.google_compute_backend_service.private to include new values
learned so far during apply, provider "google" produced an invalid new value
for .backend: block set length changed from 1 to 2.
This is a bug in the provider, which should be reported in the provider's own
issue tracker.
Steps to Reproduce
-
Create a backend_service and try to pass multiple groups to it by dynamically generating using a dyanmic block or other loop method. Use my coder as an example?
-
Try to plan and see if you get multiple backends specified
-
Apply and see if you get errors.
Important Factoids
I’ve recently been upgrading to 0.12, so I really don’t know if my dyanmic block is the right solution, or if I can use a for_each instead or some combination. I’ve found it quite hard toi distinguish from the limited examples when each variation / combination of: for, for_each and dynamic should be used.
My code works perfectly when there is only on instance group in the list. But I only tried that to prove out the code if TF compliant. My real world use case always has many instance groups to add.
Notice on my private backend service, I have explicitly set all the other block options to null. This is because when I did successfully build with one instance group, the subsequent application failed because the attributes were not set. So on re-application those parameters seem to not be optional anymore, hence the null values. Thanks to the author of this comment for the example.
I also tried turning my input list into the format:
[ { group = URL}, {group = URL } ...]
References
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 29
- Comments: 15 (4 by maintainers)
Commits related to this issue
- add sensitive_params to bigquery_data_transfer_config (#3937) * suppress diff for secret_access_key on bigquery data transfer params * add sensitiveParams for secret access key * add customize diff... — committed to modular-magician/terraform-provider-google by modular-magician 4 years ago
- add sensitive_params to bigquery_data_transfer_config (#3937) (#7174) * suppress diff for secret_access_key on bigquery data transfer params * add sensitiveParams for secret access key * add custom... — committed to hashicorp/terraform-provider-google by modular-magician 4 years ago
- add two attempts for run-terraform - dynamic backend issue https://github.com/hashicorp/terraform-provider-google/issues/3937 — committed to pivotal/docs-platform-automation by nhsieh 3 years ago
Similar issue using dynamic over backend block
error after apply:
If I apply it just one more time does works
We’re seeing this issue with our integration tests, because idempotency is one thing we’re testing for, would prefer not to simply reapply. Interestingly, I don’t think we’d been seeing this with 2.17 provider, but are getting it consistently with 2.20.1. Will try to investigate further.
I have the same issue:
I also use dynamic block, funny thing is that is I apply again, it works… nevertheless it’s quite annoying.
Not data providers technically, but they are references to other blocks in the same module. Glad you have a solution though!
Actually think I have resolved my issue after some poking around.
tl;dr
Issue was passing the results of a data lookup (on the k8s cluster) as an output in one module(gcloud-k8s), and trying to use those as the input to another module (gcloud-lb-custom).
longer read
I had a setup as such:
What I was doing
In each environment, I’d call my module (
gcloud-k8s) to build a cluster. At the end of said module I had a data lookup on the cluster which depended on all node pool creations. This would become the outputK8S_INSTANCE_GROUP_URLSThen I’d build the load balancer through my next module (
gcloud-lb-custom) which would take in input variablebackend_group_list. Obviously when calling that module, I’d fill that input with the other modules output:This has been erroring ever since upgrading to 0.12. It used to work in 0.11. Hence raising this issue.
What I changed to see if my loop was correct
I basically took the output from
tf outputand set that in variables.tf for the loadbalancer module (gcloud-lb-custom). When Itf planeverything planned correctly. When I removed an instance group, the plan reconfigured the backends correctly going from 3 backends to 2 in this instance.This made me think the issue was something to do with me passing input to one module from the output of another.
What I’m now doing
I’ve moved that data lookup into the lb module (
gcloud-lb-custom) and that lookup is configured via two other outputs from the cluster module (gcloud-k8s):Inside the cluster module:
And further down in the module, use that lookup to pass in the list of
instance_group_urls, so my dynamic backend looks like:It seems to work fairly well now so far.
I did also upgrade the google provider to latest:
TIL
Probably not the first or last time I’ll be bitten by passing things from one module to another. Arguably its cleaner to fetch the urls inside the load balancer module but I would have thought the output would be stored in state and used during the plan (probably misunderstanding internal workings of terraform plan).
As a side effect, I have yet to see that error message again, but will be doing lots of testing around this. If anyone else has the issue, hopefully some of the example above will help you find a solution.
@paddycarver can you take a look? You’ve got the key + probably more context on dynamic than I do.