terraform-plugin-sdk: bug: rounding large int values from state
Terraform Version
Terraform v0.14.0
Terraform Configuration Files
Nothing here is particularly important, only that google_compute_url_map stores a large value in a TypeInt. This value is Computed and is not changing between calls of terraform plan
resource "google_compute_url_map" "default" {
name = "url-map"
description = "Description"
default_service = google_compute_backend_bucket.static.id
default_route_action {
url_rewrite {
host_rewrite = "REDACTED"
}
}
}
resource "google_compute_backend_bucket" "static" {
name = "static-asset-backend-bucket"
bucket_name = google_storage_bucket.static.name
enable_cdn = true
}
resource "google_storage_bucket" "static" {
name = "static-bucket${random_integer.random.result}"
location = "US"
}
resource "random_integer" "random" {
min = 10000
max = 99999
}
Debug Output
Key line:
2020/12/08 14:05:12 [WARN] Provider "registry.terraform.io/hashicorp/google" produced an unexpected new value for google_compute_url_map.default during refresh.
- .map_id: was cty.NumberIntVal(7.227701560655104e+18), but now cty.NumberIntVal(7.227701560655103598e+18)
This turns into an error at apply-time
Crash Output
Expected Behavior
TypeInt value is read from state without rounding
Actual Behavior
TypeInt value is read from state with rounding
Value in state:
"map_id": 7227701560655103598,
Value returned from d.Get(“map_id”):
7227701560655104000
Steps to Reproduce
terraform applyterraform plan
Shows the error in debug logs. Changing the description of the URL map causing an update and then running terraform apply causes Terraform to error with:
Error: Provider produced inconsistent final plan
When expanding the plan for google_compute_url_map.default to include new
values learned so far during apply, provider
"registry.terraform.io/hashicorp/google" produced an invalid new value for
.map_id: was cty.NumberIntVal(7.227701560655103598e+18), but now
cty.NumberIntVal(7.227701560655104e+18).
Additional Context
I have tracked this down to a change in state between two runs of terraform apply. The value is set correctly in state and is written to the state file correctly, but seems to round when the value is read by the next invocation of terraform plan/apply
This behavior was introduced in 0.14.0, as these resources did not have a problem in the past.
References
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 6
- Comments: 21 (14 by maintainers)
Commits related to this issue
- Add test cases for ints too large for float64. See hashicorp/terraform-plugin-sdk#655. Essentially, test that integers that do not fit cleanly in float64s still retain their precision. These tests cu... — committed to hashicorp/terraform-provider-corner by paddycarver 4 years ago
- Allow opting into using json.Number in state. Go's JSON decoder, by default, uses a lossy conversion of JSON integers to float64s. For sufficiently large integers, this yields a loss of precision, an... — committed to hashicorp/terraform-plugin-sdk by paddycarver 4 years ago
- Allow opting into using json.Number in state. Go's JSON decoder, by default, uses a lossy conversion of JSON integers to float64s. For sufficiently large integers, this yields a loss of precision, an... — committed to hashicorp/terraform-plugin-sdk by paddycarver 4 years ago
- fix bigint precision bug Prior to this change, integers in state that do not fit cleanly in float64s lose their precision, leading to permadiffs. As of SDK v2.4.0, setting UseJSONNumber on the resour... — committed to hashicorp/terraform-provider-random by kmoe 3 years ago
OK so I’ve done a lot of digging into this.
So fundamentally, the cause of the issue here was the liberal use of JSON encoding and decoding that the SDK does. Why we do this is a longer story and requires more context, but fundamentally JSON is used as sort of a babel-representation of different systems and so when converting between, e.g., 0.11 and 0.12’s types, sometimes we just encode to JSON and decode back out.
Unfortunately, it is not always the case that our decoding uses the
UseNumber()method, more frequently just reaching for the defaultjson.Unmarshal… which decodes numbers into float64 when unmarshaling into aninterface{}, which we do quite a bit of the time.This is, usually, fine. However, for sufficiently large numbers (I believe the ceiling is 17 significant figures, but could be wrong about that) float64 can’t actually represent them with precision, so it… silently rounds them. Whoops.
This is all basically what Martin said a week ago.
The fix here is straightforward technically, but messy when people get involved. We can just switch to using a JSON decoder with
UseNumber()called and this specific problem goes away.There are a couple of compatibility concerns, though. First, the use of
ResourceData.Getand then casting the result to a specific type is an unavoidable pattern that every provider utilises. This means us changing the underlying type breaks compatibility, because any code using the old type will panic. Oops. Now, given that you can useResourceData.Getand cast to an integer or a float depending on whether you’re usingschema.TypeIntorschema.TypeFloatgives me hope that we could potentially smooth this over a bit, again as Martin suggested a week ago (this comment is largely just “Martin was right a week ago, but now I have the code and tests to prove it”). Inordinately largeTypeFloats would continue having a bad time, but largeTypeInts would have less of a bad time, until they hit 64 bits and start having a real bad time again. But I still am researching and investigating the TypeFloat/TypeInt conversion and casting logic to see if we can even do this. If we can’t, a boolean may need to be set on the schema.Resource, opting into the new behavior, which would be sad but would still offer a workaround.Second, the
helper/resourcetest harness also has a lot of exposure to JSON, and also doesn’t always useUseNumber. It, similarly, needs to be upgraded, or things will work but the tests will insist there’s a problem. (I lost several hours to investigating why my fix wasn’t working, only to realise I hadn’t fixed the test harness. Whoops.) This also needs to be done in such a way that users don’t have to change their code, and I’m still investigating the impact of the needed changes (we will also need to change terraform-json and terraform-exec) to determine whether they’re backwards compatible.All of this to say: the problem is known (I still don’t particularly understand why 0.14 exacerbated this; or if it didn’t, as seems plausible given we have multiple reproductions of it in 0.12 and 0.13, why it hasn’t been an issue before now) and we’re making progress on solutions, fixing this just involves moving some pieces that our ecosystem relies directly on, and so just like replacing a load-bearing pillar, we’re going to need to move carefully and that takes a little bit of time. But we are working on it.
I’m having the same issue with google_compute_url_map resource since I upgraded to terraform v0.14.0 I created the url_map in my project using terraform v0.13.5 what I’m trying to do now is to make the url_map point to a new backend service (previously an instance group, now a bucket) and then I get this error:
I also tried with terraform v0.14.3, it’s not working also.
this action (changing the backend) was working perfectly with terraform version 0.13.5
I believe our current understanding is this is the 0.13 behavior, though I have no compelling answer as to why it suddenly became an issue.
I’m hoping to get a fix out in the next few days, if possible. Providers would then need to update their SDK versions and ship new releases.
I don’t know if it is the same issue but with 0.14 I am getting these errors for a GKE cluster:
and plan/apply fails. 13.5 works fine.