terraform-provider-aws: [Bug]: Provider produced inconsistent final plan for complex `aws_wafv2_web_acl` configurations

Related:

Terraform Core Version

1.3.1

AWS Provider Version

4.34.

Affected Resource(s)

  • aws_wafv2_web_acl

Expected Behavior

Running terraform apply should finish successfully

Actual Behavior

Running terraform apply fails, and outputs a ~2.5MB Go StackTrace.

Relevant Error/Panic Output Snippet

Plan: 0 to add, 1 to change, 0 to destroy.

Error: Provider produced inconsistent final plan

When expanding the plan for aws_wafv2_web_acl.main to include new values
learned so far during apply, provider "registry.terraform.io/hashicorp/aws"
produced an invalid new value for .rule: planned set element
cty.ObjectVal(map[string]cty.Value{"action":cty.ListValEmpty(cty.Object(map[string]cty.Type{"allow":cty.List(cty.Object(map[string]cty.Type{"custom_request_handling":cty.List(cty.Object(map[string]cty.Type{"insert_header":cty.Set(cty.Object(map[string]cty.Type{"name":cty.String,
"value":cty.String}))}))})),
"block":cty.List(cty.Object(map[string]cty.Type{"custom_response":cty.List(cty.Object(map[string]cty.Type{"custom_response_body_key":cty.String,
"response_code":cty.Number,

... (this continues for 2,5MB (see "_expected/out.log" in attached ZIP File)

Terraform Configuration Files

See attached: tf-waf-custom-response-bug.zip

Steps to Reproduce

  1. terraform init
  2. terraform apply -var='v=1' This will deploy all resources and will provision v1 of the Custom Response we configure for WAF WebACL to display a Maintenance Page. This should pass correctly.
  3. terraform apply -var='v=2' --auto-approve > out.log 2>&1
    This will mimic making modification to the HTML in Custom Response (terraform will use a differenf file to generate a change in WebACL Custom Response Configuration). You should see a very long exception logged.

Debug Output

N/A

Panic Output

See _expected/out.log in attached ZIP file.

Important Factoids

  1. I initially thought the error is caused by HTML in CustomResource, but then I started reproducing this error with minimal configuration, and only was able to reproduce when I added all of the rules we use in our production. E.g., if you remove the last set of rules (the dynamic section operating on local.managed_rules) the error no longer occurs. This would suggest that the error is a result of overall complexity (or perhaps size?) of the change to apply, or a combination of settings, rather than a single setting. But that’s just my impression.
  2. I have tried several other changes to work around the issue, and it seems like trying to change any custom response in this particular setup causes that error. 2.1. I eventually changed the website rule to returning a 307 response with Location header, and that would still fail when trying to apply the change 2.2. I also tried to remove the custom response for api rule, and make it also only respond with status and headers, and terraform failed to remove custom response in this case as well (in this case the reponse is just a simple JSON, so it seems the content is irrelevant).
  3. I also tried running that with TF_LOG=debug (unfortunately don’t have that log anymore) and I remember seeing several “Produced inconsistent plan, but we don’t care because it’s using legacy SDK” sort of messages around all WAFv2 resource (not just WebACL). Perhaps that’s related?

References

No response

Would you like to implement a fix?

No

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 42
  • Comments: 16 (7 by maintainers)

Most upvoted comments

@YakDriver, thanks for looking into the issue.

I have tested some configurations this morning, mostly the ones I currently have + the ones I’m planning to migrate to:

1.4.2 + 4.67.0: worked!
1.4.6 + 4.67.0: worked!
1.5.3 + 5.8.0:  worked! # Needed to refactor excluded_rule => rule_action_override for this to work.

I can also say I’ve introduced some (sometimes significant) changes to my WAF deployment scripts since this ticket was raised, and all of them worked without issues. Looks like this is indeed resolved, at least for my case.