terraform-provider-aws: Significant slowdowns running terraform for WAF resources on AWS provider v2.69.0

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave “+1” or other comments that do not add relevant new information or questions, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

0.12.28

Affected Resource(s)

aws_wafv2_web_acl

Terraform Configuration Files

resource "aws_wafv2_web_acl" "demo-waf" {
  name        = "Demo-WAF"
  description = "Demo-WAF"
  scope       = "REGIONAL"

  default_action {
    block {}
  }

  rule {
    name     = "RateLimit"
    priority = 200

    action {
      block {}
    }

    statement {
      rate_based_statement {
        limit              = 1000
        aggregate_key_type = "IP"
      }
    }

    visibility_config {
      cloudwatch_metrics_enabled = false
      metric_name                = "demo_RateLimit"
      sampled_requests_enabled   = false
    }
  }

  rule {
    name     = "AWSManagedRulesCommonRuleSet"
    priority = 998

    override_action {
      none {}
    }

    statement {
      managed_rule_group_statement {
        name        = "AWSManagedRulesCommonRuleSet"
        vendor_name = "AWS"
      }
    }

    visibility_config {
      cloudwatch_metrics_enabled = false
      metric_name                = "AzureAD-AWSManagedRulesCommonRuleSet"
      sampled_requests_enabled   = false
    }
  }

  rule {
    name     = "AWSManagedRulesKnownBadInputsRuleSet"
    priority = 999

    override_action {
      none {}
    }

    statement {
      managed_rule_group_statement {
        name        = "AWSManagedRulesKnownBadInputsRuleSet"
        vendor_name = "AWS"
      }
    }

    visibility_config {
      cloudwatch_metrics_enabled = false
      metric_name                = "AzureAD-AWSManagedRulesKnownBadInputsRuleSet"
      sampled_requests_enabled   = false
    }
  }

  visibility_config {
    cloudwatch_metrics_enabled = true
    metric_name                = "Demo-WAF"
    sampled_requests_enabled   = false
  }
}

Debug Output

Expected Behavior

This should create a plan within a few seconds after running terraform plan, and validate the code after a terraform validate It should also only take up to a couple of minutes to run a terraform plan

Actual Behavior

The plan and validate take a very long time to run - it works eventually, but it’s taking upwards of 3 minutes for the validate, and five minutes, normally around ten for the plan. This is just for the one resource. Apply takes even longer than this, presumably because it’s running a plan on-top of doing other things.

If I downgrade my provider version to v 2.67.0, all of the actions are completed within a few seconds, as expected.

Steps to Reproduce

Set provider version to v2.69.0
terraform plan
Set provider version to v2.67.0
terraform plan

Important Factoids

This only seems to affect WAF resources, I’ve tried the provider in other projects and haven’t seen any issues. I’m unsure if it’s limited to the web acl resource specifically, but that’s the only one I’ve been able to reproduce it in

References

N/A

About this issue

Original URL
State: closed
Created 4 years ago
Reactions: 118
Comments: 31 (6 by maintainers)

Most upvoted comments

Hi all 👋 . From initial review of this issue, I see it unfortunately stems from https://github.com/terraform-providers/terraform-provider-aws/pull/13961 which addressed #13862 and was introduced inv2.69.0 of the provider. By adding a needed nested level to match the API in these 4 statement types (AND, OR, NOT, and RATE_BASED), we’ve run in to this run-time issue. Looking at the community following of this issue and related WAFv2 resources, we plan to prioritize them after the upcoming major release of v3.0.0 of the provider.

+26

anGie44 on Jul 28, 2020

Good news @samtarplee and those following this issue, the upstream terraform issue hashicorp/terraform#25889 has a PR to fix the slowdowns experienced here 🎉 I’ll provide an update here again when it lands in the forthcoming release of Terraform v0.13.5 (reference: https://github.com/hashicorp/terraform/blob/v0.13/CHANGELOG.md).

+21

anGie44 on Oct 15, 2020

Terraform 0.13.5 will be released today and include the speedup for deeply nested resources mentioned by @anGie44 😃 Thanks for your patience.

+14

pkolyvas on Oct 21, 2020

Hi all 👋 – first off, apologies for the silence here! This has been prioritized and we are investigating with the Terraform Core team to further debug the behavior imposed by this rather large schema in the WebACL resource. I will update here accordingly with a more detailed response as to what our next steps will be once we can narrow down where the time is being spent during the terraform plan calls.

Please note, from the provider perspective, there isn’t more we can do at the moment to lessen the burden of the slowness everyone is experiencing without directly reducing the number of supported statements i.e. a breaking change to revert the changes to support #13862 to see the runtimes previously seen in v2.67.0 of the provider.

anGie44 on Aug 17, 2020

Confirming this issue has been resolved when using v0.13.5+ of Terraform.

Given the example in the description (Regional WebACL w/3 rules: 2 ManagedRuleGroups, 1 Rate-based) in us-west-1:

With `provider[registry.terraform.io/hashicorp/aws] 3.11.0`

Plan output time (5s):

terraform plan -out=plan.out
2020/10/27 22:30:41 [INFO] Terraform version: 0.13.5
...
2020/10/27 22:30:46 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock

Apply output time (3s):

terraform apply plan.out
2020/10/27 22:32:45 [INFO] Terraform version: 0.13.5
...
2020/10/27 22:32:48 [TRACE] eval: *terraform.evalCloseModule

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
...
2020/10/27 22:32:48 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock

Destroy output time (5s):

terraform destroy --force
2020/10/27 22:34:01 [INFO] Terraform version: 0.13.5
...
2020/10/27 22:34:06 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info
Destroy complete! Resources: 1 destroyed.
...
2020-10-27T22:34:06.400-0400 [DEBUG] plugin: plugin exited

With `provider[registry.terraform.io/hashicorp/aws] 3.0.0`

Plan output time (12s):

terraform plan -out=plan.out
2020/10/27 22:24:00 [INFO] Terraform version: 0.13.5
...
2020/10/27 22:24:12 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock

Apply output time (14s):

terraform apply plan.out
2020/10/27 22:25:30 [INFO] Terraform version: 0.13.5
...
2020/10/27 22:25:44 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate

Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
...
2020/10/27 22:25:44 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock

Destroy output time (27s):

 terraform destroy --force
2020/10/27 22:27:54 [INFO] Terraform version: 0.13.5
...
2020/10/27 22:28:21 [TRACE] statemgr.Filesystem: removing lock metadata file .terraform.tfstate.lock.info

Destroy complete! Resources: 1 destroyed.
...
2020-10-27T22:28:21.218-0400 [DEBUG] plugin: plugin exited

With `provider[registry.terraform.io/hashicorp/aws] 2.69.0`

Plan output time (15s):

terraform plan -out=plan.out
2020/10/27 22:15:15 [INFO] Terraform version: 0.13.5
...
2020/10/27 22:15:30 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock

Apply output time (14s):

terraform apply plan.out
2020/10/27 22:19:09 [INFO] Terraform version: 0.13.5
...
2020/10/27 22:19:23 [TRACE] statemgr.Filesystem: have already backed up original terraform.tfstate to terraform.tfstate.backup on a previous write
Apply complete! Resources: 1 added, 0 changed, 0 destroyed.
...
2020/10/27 22:19:23 [TRACE] statemgr.Filesystem: unlocking terraform.tfstate using fcntl flock

Destroy output time (29s):

terraform destroy --force
2020/10/27 22:21:21 [INFO] Terraform version: 0.13.5
...
2020/10/27 22:21:50 [TRACE] statemgr.Filesystem: writing snapshot at terraform.tfstate

Destroy complete! Resources: 1 destroyed.
...
2020-10-27T22:21:50.722-0400 [DEBUG] plugin: plugin exited

anGie44 on Oct 28, 2020

I wonder how could you release this change with its current status, it speaks loudly on quality controls of the project. basically it ruins the whole experience. and it seems no one feels the need to release this quicker. the side effect of long-running terraform applies/plans is the session token gets expired and the build constantly fails, extending the session time is not an option since role chaining has a limit of 1 hour. FYI @bflad

mohsen0 on Aug 11, 2020

It would be great if everybody who upvoted this specific issue could also upvote the corresponding upstream issue (mentioned above by anGie44). The ratio is 5:1 atm 😦.

dvishniakov on Aug 31, 2020

@anGie44 with v3.0.0 being released, do you have a target version and/or timeline for this being addressed?

tophercullen on Aug 11, 2020

Thank you for your quick response, I ask you to reconsider releasing the fix you made on https://github.com/terraform-providers/terraform-provider-aws/pull/14073 before the major release since the issue is very painful when there is a CI/CD system and multiple instances of WAFv2 ( https://github.com/umotif-public/terraform-aws-waf-webaclv2) in our case, The terraform plan run time jumped from 3 min job to 55-minute. it is a shame if users had to live with this if this is not going to release any time soon.

mohsen0 on Jul 28, 2020

Also see https://github.com/terraform-providers/terraform-provider-aws/issues/5822

immo-huneke-zuhlke on Sep 23, 2020

Following up here: this new issue in terraform tracks the behavior we’re seeing with the nested statements and the significant slowdowns https://github.com/hashicorp/terraform/issues/25889

Again, unfortunately there’s not much we can do within the provider code except for making the schema less nested in the webACL resource atm. The hope would be to have upstream terraform optimizations take place in order to keep the schema depth as-is.