terraform-plugin-sdk: bug: rounding large int values from state

Terraform Version

Terraform v0.14.0

Terraform Configuration Files

Nothing here is particularly important, only that google_compute_url_map stores a large value in a TypeInt. This value is Computed and is not changing between calls of terraform plan

resource "google_compute_url_map" "default" {
  name        = "url-map"
  description = "Description"

  default_service = google_compute_backend_bucket.static.id
  default_route_action {
    url_rewrite {
      host_rewrite = "REDACTED"
    }
  }
}

resource "google_compute_backend_bucket" "static" {
  name        = "static-asset-backend-bucket"
  bucket_name = google_storage_bucket.static.name
  enable_cdn  = true
}

resource "google_storage_bucket" "static" {
  name     = "static-bucket${random_integer.random.result}"
  location = "US"
}

resource "random_integer" "random" {
  min = 10000
  max = 99999
}

Debug Output

Key line:

2020/12/08 14:05:12 [WARN] Provider "registry.terraform.io/hashicorp/google" produced an unexpected new value for google_compute_url_map.default during refresh.
      - .map_id: was cty.NumberIntVal(7.227701560655104e+18), but now cty.NumberIntVal(7.227701560655103598e+18)

This turns into an error at apply-time

Crash Output

Expected Behavior

TypeInt value is read from state without rounding

Actual Behavior

TypeInt value is read from state with rounding

Value in state: "map_id": 7227701560655103598,

Value returned from d.Get(“map_id”): 7227701560655104000

Steps to Reproduce

terraform apply
terraform plan

Shows the error in debug logs. Changing the description of the URL map causing an update and then running terraform apply causes Terraform to error with:

Error: Provider produced inconsistent final plan

When expanding the plan for google_compute_url_map.default to include new
values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.map_id: was cty.NumberIntVal(7.227701560655103598e+18), but now
cty.NumberIntVal(7.227701560655104e+18).

Additional Context

I have tracked this down to a change in state between two runs of terraform apply. The value is set correctly in state and is written to the state file correctly, but seems to round when the value is read by the next invocation of terraform plan/apply

This behavior was introduced in 0.14.0, as these resources did not have a problem in the past.

References

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 6
Comments: 21 (14 by maintainers)

Commits related to this issue

Add test cases for ints too large for float64. See hashicorp/terraform-plugin-sdk#655. Essentially, test that integers that do not fit cleanly in float64s still retain their precision. These tests cu... — committed to hashicorp/terraform-provider-corner by paddycarver 4 years ago
Allow opting into using json.Number in state. Go's JSON decoder, by default, uses a lossy conversion of JSON integers to float64s. For sufficiently large integers, this yields a loss of precision, an... — committed to hashicorp/terraform-plugin-sdk by paddycarver 4 years ago
Allow opting into using json.Number in state. Go's JSON decoder, by default, uses a lossy conversion of JSON integers to float64s. For sufficiently large integers, this yields a loss of precision, an... — committed to hashicorp/terraform-plugin-sdk by paddycarver 4 years ago
fix bigint precision bug Prior to this change, integers in state that do not fit cleanly in float64s lose their precision, leading to permadiffs. As of SDK v2.4.0, setting UseJSONNumber on the resour... — committed to hashicorp/terraform-provider-random by kmoe 3 years ago

Most upvoted comments

OK so I’ve done a lot of digging into this.

knowthingsnow

So fundamentally, the cause of the issue here was the liberal use of JSON encoding and decoding that the SDK does. Why we do this is a longer story and requires more context, but fundamentally JSON is used as sort of a babel-representation of different systems and so when converting between, e.g., 0.11 and 0.12’s types, sometimes we just encode to JSON and decode back out.

Unfortunately, it is not always the case that our decoding uses the UseNumber() method, more frequently just reaching for the default json.Unmarshal… which decodes numbers into float64 when unmarshaling into an interface{}, which we do quite a bit of the time.

This is, usually, fine. However, for sufficiently large numbers (I believe the ceiling is 17 significant figures, but could be wrong about that) float64 can’t actually represent them with precision, so it… silently rounds them. Whoops.

This is all basically what Martin said a week ago.

The fix here is straightforward technically, but messy when people get involved. We can just switch to using a JSON decoder with UseNumber() called and this specific problem goes away.

There are a couple of compatibility concerns, though. First, the use of ResourceData.Get and then casting the result to a specific type is an unavoidable pattern that every provider utilises. This means us changing the underlying type breaks compatibility, because any code using the old type will panic. Oops. Now, given that you can use ResourceData.Get and cast to an integer or a float depending on whether you’re using schema.TypeInt or schema.TypeFloat gives me hope that we could potentially smooth this over a bit, again as Martin suggested a week ago (this comment is largely just “Martin was right a week ago, but now I have the code and tests to prove it”). Inordinately large TypeFloats would continue having a bad time, but large TypeInts would have less of a bad time, until they hit 64 bits and start having a real bad time again. But I still am researching and investigating the TypeFloat/TypeInt conversion and casting logic to see if we can even do this. If we can’t, a boolean may need to be set on the schema.Resource, opting into the new behavior, which would be sad but would still offer a workaround.

Second, the helper/resource test harness also has a lot of exposure to JSON, and also doesn’t always use UseNumber. It, similarly, needs to be upgraded, or things will work but the tests will insist there’s a problem. (I lost several hours to investigating why my fix wasn’t working, only to realise I hadn’t fixed the test harness. Whoops.) This also needs to be done in such a way that users don’t have to change their code, and I’m still investigating the impact of the needed changes (we will also need to change terraform-json and terraform-exec) to determine whether they’re backwards compatible.

All of this to say: the problem is known (I still don’t particularly understand why 0.14 exacerbated this; or if it didn’t, as seems plausible given we have multiple reproductions of it in 0.12 and 0.13, why it hasn’t been an issue before now) and we’re making progress on solutions, fixing this just involves moving some pieces that our ecosystem relies directly on, and so just like replacing a load-bearing pillar, we’re going to need to move carefully and that takes a little bit of time. But we are working on it.

paddycarver on Dec 16, 2020

I’m having the same issue with google_compute_url_map resource since I upgraded to terraform v0.14.0 I created the url_map in my project using terraform v0.13.5 what I’m trying to do now is to make the url_map point to a new backend service (previously an instance group, now a bucket) and then I get this error:

Error: Provider produced inconsistent final plan


When expanding the plan for
module.google_compute_url_map_uat-mcm-url-map.google_compute_url_map.urlmap to
include new values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.map_id: was cty.NumberIntVal(6.438816093867301036e+18), but now
cty.NumberIntVal(6.438816093867301e+18).

This is a bug in the provider, which should be reported in the provider's own
issue tracker.

I also tried with terraform v0.14.3, it’s not working also.

this action (changing the backend) was working perfectly with terraform version 0.13.5

Abdelwaheb-Hnaien on Dec 21, 2020

I believe our current understanding is this is the 0.13 behavior, though I have no compelling answer as to why it suddenly became an issue.

I’m hoping to get a fix out in the next few days, if possible. Providers would then need to update their SDK versions and ship new releases.

paddycarver on Dec 15, 2020

I don’t know if it is the same issue but with 0.14 I am getting these errors for a GKE cluster:

2020/12/11 11:28:32 [WARN] Provider "registry.terraform.io/hashicorp/google-beta" produced an unexpected new value for module.gke_cluster_europe_west3.google_container_cluster.primary during refresh.
      - .master_version: was cty.StringVal("1.17.13-gke.1401"), but now cty.StringVal("1.17.13-gke.2001")
      - .node_version: was cty.StringVal("1.17.13-gke.1400"), but now cty.StringVal("1.17.13-gke.2001")
      - .node_pool[0].node_count: was cty.NumberIntVal(0), but now cty.NumberIntVal(1)
      - .node_pool[0].version: was cty.StringVal("1.17.13-gke.1400"), but now cty.StringVal("1.17.13-gke.2001")
      - .node_pool[1].version: was cty.StringVal("1.17.13-gke.1400"), but now cty.StringVal("1.17.13-gke.2001")

and plan/apply fails. 13.5 works fine.

cagataygurturk on Dec 11, 2020