terraform-provider-azurerm: Azurerm_frontdoor with v2.24.0 breaks when azure frontdoor is edited in portal.

Community Note

  • Please vote on this issue by adding a đź‘Ť reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform (and AzureRM Provider) Version

Terraform v0.12.21
+ provider.azurerm v2.24.0

Affected Resource(s)

  • azurerm_frontdoor

Terraform Configuration Files

provider "azurerm" {
  version = "=2.24.0"
  features {} # https://www.terraform.io/docs/providers/azurerm/index.html#features
}

resource "azurerm_resource_group" "example" {
  name     = "andreastester"
  location = "norway east"
}

resource "azurerm_frontdoor" "example" {
  name                                         = "andreastester"
  resource_group_name                          = azurerm_resource_group.example.name
  enforce_backend_pools_certificate_name_check = false

  routing_rule {
    name               = "exampleRoutingRule1"
    accepted_protocols = ["Http", "Https"]
    patterns_to_match  = ["/*"]
    frontend_endpoints = ["exampleFrontendEndpoint1"]
    forwarding_configuration {
      forwarding_protocol = "MatchRequest"
      backend_pool_name   = "exampleBackendBing"
    }
  }

  backend_pool_load_balancing {
    name = "exampleLoadBalancingSettings1"
  }

  backend_pool_health_probe {
    name = "exampleHealthProbeSetting1"
    
  }

  backend_pool {
    name = "exampleBackendBing"
    backend {
      host_header = "www.bing.com"
      address     = "www.bing.com"
      http_port   = 80
      https_port  = 443
    }

    load_balancing_name = "exampleLoadBalancingSettings1"
    health_probe_name   = "exampleHealthProbeSetting1"
  }

  frontend_endpoint {
    name                              = "exampleFrontendEndpoint1"
    host_name                         = "andreastester.azurefd.net"
    custom_https_provisioning_enabled = false
  }
}

Debug Output

https://gist.github.com/andrstor/0aa07440e0a01befb23351db3257340f

Panic Output

Expected Behavior

Terraform identifies that no changes are required or tries to recover its state.

Actual Behavior

Error: flattening backend_pool: ID was missing the healthProbeSettings element

Steps to Reproduce

  1. terraform apply
  2. Do anything in the azure portal that trigges a change. For instance add a rule engine rule to the routing rule.
  3. terraform plan

You can also undo the manual change again, the resource is still broken for azurerm v2.24.0. This works with v2.23.0

Important Factoids

None

References

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 70
  • Comments: 92 (19 by maintainers)

Commits related to this issue

Most upvoted comments

Hello, is there an update on ETA for this issue? We are not able to manage Frontdoor via Terraform while this issue persists.

Same issue here but on another level: Error: flattening frontend_endpoint: ID was missing the frontDoorWebApplicationFirewallPolicies element

@GarethOates @cpressland we’ve tried that historically and it leads to further cases of this, which ultimately ends up breaking other usages in subtle ways.

It’s worth calling out that the Azure Networking team get a lot of exceptions, covering:

  • Swagger (Azure/azure-rest-api-specs)
    • Where the linting is allowed to fail, due to API’s which don’t comply with the ARM Spec
  • Resource Manager API’s
    • Where breaking changes are frequently shipped in existing API versions, when the ARM Spec requires breaking changes go into a new API version
    • Where the ordering of keys in the serialized JSON changes the behaviour of the API (e.g. the API behaves differently when foo is serialized before bar) - which the JSON spec disallows (which requires that maps/dictionaries are order-insensitive)
  • Terraform
    • Where we end up having to work around said breaking changes (which means API version upgrades take longer) and try to identify these serialization issues

Whilst I appreciate it’s frustrating to be blocked here - we can’t keep layering workarounds on top of workarounds, doing so is exacerbating the problem and leading to the issues we see today (and a bunch of more subtle, and harder to diagnose, issues like the JSON ordering issue mentioned above).

We’ve reached out to the Portal Team who’ve committed to fixing this bug on their side (which we’ve chased them on) - after which we can then work with the Networking Team to fix the root-cause of these bugs.

As Microsoft have committed to fixing these bugs (and for the reasons outline above) - unfortunately we have no plans to introduce a hack for this API bug. In the interim, since this bug is only triggered when editing the resource in the Portal - I believe it should be possible to workaround this using RBAC (or a Management Lock).

From our side, whilst we appreciate this isn’t ideal (and is frustrating to be blocked) - we’re working with Microsoft to fix this and will post an update as soon as we have one.

Thanks!

Ultimately this’ll be fixed via #9750 - which ignores the casing returned from the Azure API and rewrites this to be consistent on Terraform’s side, whilst there’s downsides to that approach (and the Service Team ultimately need to fix these bugs in the API…) this should workaround this series of API bugs for the moment.

@WodansSon would you be able to get an ETA for a fix from the service team here?

@tombuildsstuff I will reach out to the Front Door service team and see how quickly we can get a fix in place for this issue, it will most likely include some cross team collaboration with the portal team to roll back their changes to get this issue totally fixed.

When reporting issues, please keep in mind to remain respectful and professional, I understand that this is frustrating and I am doing the best that I can to correct the issue, however some aspects are out of my control. Since WAF is it’s own separate resource I am going to assume the workaround will have to be applied to that resources as well(e.g. open WAF in portal, update, save) to correct the casing issues in that resource. Since it is it’s own resource I will have to reach out to the service team again to verify that the fix that was applied to that resource as well. I will open a new issue for the WAF specifically since this issue for frontdoor appears to be fixed.

@andrstor @tombuildsstuff The fix has be deployed to all regions, so I am going to go ahead and close this issue. Sorry it took so long for the turnaround. 🚀

@tombuildsstuff - fair enough, from a technical perspective I completely agree that this is Microsofts burden to fix, I just don’t see it actually happening in a sane timeframe. Else, could we mark my previous post as on-topic again? It does include potential workarounds to this issue for users currently blocked by this. I’m happy to edit it to make it more on-topic or even put it somewhere else if you have a good suggestion?

@camallen Your AFD will be fine now since you corrected the casing that was introduced by the portal lowercase issue. You are safe to edit the Frontdoor via portal as well since they have reverted the changes they made that caused this issue in the first place.

UPDATE: I have just heard back from the portal team and they have confirmed that the lower casing issue also exists in the Frontdoor Web Application Firewall Policies UI layer and they are currently working on a fix for that. I have yet to hear back from the AFD team, but I will keep you posted on the progress of this issue once I receive more information. Thank you. 🚀

Did a little bit more testing and found that on the brand new AFD I could modify a few things via https://resources.azure.com and bring it under TF management using TF v13.5 and azurerm provider v2.36.0.

under "frontendEndpoints" / "webApplicationFirewallPolicyLink" I updated the id of each FE from all lowercase to frontDoorWebApplicationFirewallPolicies.

The next plan failed on:

Error: flattening `routing_rules`: flattening `frontend_endpoints`: ID was missing the `frontendEndpoints` element

so back in resources under "routingRules" / "frontendEndpoints" I updated the id of each RR from all lowercase to frontendEndpoints.

My next tf plan and apply worked.

We fought so hard with Azure Support during some previous Azure Front Door Terraform/API issues to get them to recognise the Azure API was a bit of a mess and provided multiple examples via Terraform, Azure Portal, and Azure CLI. Response was simply that this isn’t an issue because the Azure Portal still works, I kinda get it, but I equally don’t think it’s this projects responsibility to have to constantly build work-arounds to a buggy API. I’ll raise this issue with our Account Managers etc again and see if we can get any traction.

@KyMidd Thank you for your question. I am still attempting to get in contact with the Front Door service team about this and have begun to escalate the issue internally to force some action about getting this fixed. Unfortunately I am not able to provide an ETA at this time, but as soon as I have any new news I will promptly update the status here.

I’ve just had to upgrade to 2.33.0 (from 2.2!) because of this issue, and now we’re seeing this bug block our deployments. I know that you ultimately have to wait for Microsoft to fix the underlying problem, but could you at least in the meantime mark comments that have workarounds as “on topic” so they’re easy to find?

@kplantus Yes, that is exactly what I was suspecting was going on as well. I have done some digging myself and found this appears to be an issue in the API instead of Portal this time. I am already in contact with the service team to get an ETA for a high pri fix and deployment. However, that is still in negotiations with the team, at the same time I am also in contact with the portal team to ensure that the lower casing of the ID issue isn’t also in their UI layer.

NOTE: Please ensure that the provider you are using is at least v2.24.0 as that is the version of the provider where a substantial amount of casing normalization was add to the provider.

I have conferred with other team members and we have agreed that we will continue to track the issue on this issue instead of splinting it up across multiple issues. Thank you for your patience.

@camallen, this will go out with our weekly release this Thursday (tomorrow)

@tombuildsstuff

@WodansSon did you get an update from the Service/Portal Team here?

The last I heard from the service team, Nov. 4th 2020, is that the deployment of the fix is ongoing and completed for several stages but there was an Azure locked down blocking the complete roll out of the fix to all regions. They stated that they would continue to roll out the change to the rest of the regions after the lock down was lifted. I will continue to put pressure on the portal team in an attempt to get this pushed out ASAP.

and as such we have no plans to merge #8046, as previous attempts at doing so caused subtle issues in other API’s.

I understand your desire to have the problem fixed at the root, but so much time has passed since this issue was raised and there are clearly a lot of users who want to be able to manage FrontDoor through terraform. Can this fix not be merged in as a temporary work around until a more permanent solution is offered from the Microsoft team? As far as I can see, it’s a localized change to a front door specific file. There are so many features since provider 2.23.0 came out that many users will most likely be wanting to take advantage of, but cannot just now due to this bug.

@kevinchabreck I’ve had the exact same issue with the same steps as you, but it started working once i changed the provider version to 2.23.0. You can try it.

@KyMidd, I am unable to repro the behavior you are reporting above. That would explain my results as well! Awesome! 🚀

I followed your steps with the same config file and when I execute step 6 I get:

Refresh

azurerm_resource_group.testing_kyler: Refreshing state... [id=/subscriptions/{subscription}/resourceGroups/XXXXXX-frontDoor-Repro]
azurerm_frontdoor_firewall_policy.frontdoor: Refreshing state... [id=/subscriptions/{subscription}/resourceGroups/XXXXXX-frontDoor-Repro/providers/Microsoft.Network/frontDoorWebApplicationFirewallPolicies/KylerTestingWafPolicy]
azurerm_frontdoor.example: Refreshing state... [id=/subscriptions/{subscription}/resourceGroups/XXXXXX-frontDoor-Repro/providers/Microsoft.Network/frontDoors/XXXXXX-testing-frontdoor]

Plan

Refreshing Terraform state in-memory prior to plan...
The refreshed state will be used to calculate this plan, but will not be
persisted to local or remote state storage.

azurerm_resource_group.testing_kyler: Refreshing state... [id=/subscriptions/{subscription}/resourceGroups/XXXXXX-frontDoor-Repro]
azurerm_frontdoor_firewall_policy.frontdoor: Refreshing state... [id=/subscriptions/{subscription}/resourceGroups/XXXXXX-frontDoor-Repro/providers/Microsoft.Network/frontDoorWebApplicationFirewallPolicies/KylerTestingWafPolicy]
azurerm_frontdoor.example: Refreshing state... [id=/subscriptions/{subscription}/resourceGroups/XXXXXX-frontDoor-Repro/providers/Microsoft.Network/frontDoors/XXXXXX-testing-frontdoor]

------------------------------------------------------------------------

No changes. Infrastructure is up-to-date.

This means that Terraform did not detect any differences between your
configuration and real physical resources that exist. As a result, no
actions need to be performed.

@ajklotz I don’t believe there is a public facing ticket or support request for this issue. That said, I have included a link to this issue in all internal communications for this problem so they are very much aware of the contention the changes in the API have caused. Thank you for asking!

@WodansSon and @tombuildsstuff : Sincere thank you from myself and on behalf of the community here for working on a workaround for this issue. I have several teams affected, and despite us pushing hard on Microsoft’s internal enterprise support, we haven’t made any progress at all. I think we’re all aware here that the real root cause of this issue is standard-breaking API implementation on Azure side, and from your user-base I want to say: Thank you.

I agree this ticket should be re-opened. I tried updating the provider to the latest 2.36.0 and have the same issue.

Error: flattening frontend_endpoint: ID was missing the frontDoorWebApplicationFirewallPolicies element

@camallen @Poil @alec-pinson @tombuildsstuff

I have done a bit of an experiment and what I have found is the easiest way to fix a legacy resource that is stuck in this in between state is to modify the resource in the Portal and save the changes. Once the changes are saved you can then manage the resource via Terraform. When I was investigating this I disabled one of my routing rules, update, then saved. Once the modification to the service was complete via portal I was then able to successfully manage it again with Terraform. You have to modify the configuration of the Frontdoor directly, adding or updating a Tag will not trigger the code that rewrites the id’s with the correct casing.

NOTE: The provider version should be the latest version as there have been changes made to this resource to account for various other casing issues.

@scottzilla We already have a ton of subscriptions and use Terraform provider Aliases extensively. Unfortunately it doesn’t let us mix and match Provider versions though, I get the same error as @CoopCNIT.

@surlypants can you elaborate on this?

push that state file to the appropriate place

I’ll try…

The (remote) state file in the terragrunt workspace containing our FD could not plan (with 2.25.0) post portal touch. Downgrading to 2.23.0 still would not pass a plan. I thus imported the resource to a local state file, downloaded the remote state file and replaced its front door module’s “instances” array with that from the local state file. I then state push-ed the result back up to the remote backend. Finally, 2.23.0 would plan. We now are back to the time-outs workaround from:

https://github.com/terraform-providers/terraform-provider-azurerm/issues/7925

Side note: I tried setting required_providers: azurerm = “>= 2.23.0, <=2.25.0” at the top level workspace and scoped 2.23.0 specifically to the frontdoor workspace; but it seems that when providing a range, the lowest always wins. So our entire infra is back to 2.23.0

hope this helps / is understandable

@WodansSon : Update, I have destroyed and recreated this same config several times now to make sure the same error is generated, and I’m unable to replicate it. I’m not sure what happened there. I am now seeing FrontDoor reliably managed by terraform!

@subesokun I am sorry to hear that, but please be patient, @tombuildsstuff and myself have been working on this issue and I believe we have a solution in the provider which should fix 99.999% if the issues that are currently being hit… We are still testing the solution, but so far it all looks good… again, I am sorry for this pain, but we are doing all we can to correct this issue.

@WodansSon I’m throwing my hat in as another who still has an issue with the FD WAF policy.

I even tried to create a brand new AFD. After it created, I modified a value via the portal to see if the rewrite workaround would work for a new resource. Still get:

Error: flattening `frontend_endpoint`: ID was missing the `frontDoorWebApplicationFirewallPolicies` element

@robselway

just to clarify - is the only resolution here to wait for Azure to fix the issue?

Based on what I can see, unfortunately yes.

The Azure API Specification states that values should be returned in the casing they’re submitted (although the HTTP Specification states URI’s should be case-sensitive but I digress) - so unfortunately this is an API bug which needs to be fixed, since this should be returned in the same casing we’re submitting it in here.

For what it’s worth we’ve also raised this on our end - unfortunately the Networking API’s differ from every other Azure API here, so I don’t think we can easily work around this (unless perhaps we can find the original casing from the specific sub-element, but that’s assuming the API doesn’t change to break the casing there too)


@JeffreyRichter this is a good example of the Networking API’s returning URI’s in a case-insensitive manner which differs from the recommendation in the ARM Spec (which unfortunately differs from the HTTP Specification, where the entire URI is case sensitive, this StackOverflow answer for more details).

Whilst it’d be possible to work around this bug if the “resource type” segment could be parsed case-insensitively, where the entire URI is lower-cased in some responses, but not in others (see this comment for example responses) - there’s not much we can do here, since we can’t guarantee these are the correct casing (or that the Networking API’s won’t change this casing in a future update).

Whilst this Github issue is the wrong place for this discussion, I feel like perhaps the current ambiguous behaviour defined in the ARM Specification (“return in the casing the user passed it in as”) is the cause of this confusion - perhaps it’d be clearer if the ARM Specification stated that the entire URI must (to use the language of RFC 7230) be treated as case-sensitive in Responses - WDYT?

Thanks!

@lyubomirr downgrading to azurerm v2.23.0 seemed to work! Additionally, I had previously modified my resource ID string during the import from .../providers/Microsoft.Network/frontdoors/... to .../providers/Microsoft.Network/frontDoors/... due to an error I got when attempting a previous import. Reverting this and importing the resource ID exactly as it is shown in the Azure console (ie. the one with the lowercase D in frontdoors) fixed this issue. Thanks for the tip!

@KyMidd Sorry that you are hitting that issue, however the casing should not matter anymore, I believe you are hitting a validation rule. Do you have a repro for this? If I can get a clear repro for this issue I might be able to get a fix in before the next release. Thank you.

I just looked at the code, what appears to be going on is that you have selector = "" in the config file, which would trigger the below validation rule. Can you confirm if this is the case or not in your config file?

"selector": {
	Type:         schema.TypeString,
	Optional:     true,
	ValidateFunc: validation.StringIsNotEmpty,
},

Somehow I’m now in a kind of dead lock. I need to downgrade to 2.23.0 because of this bug but when I do so I’m running into another frontdoor related TF bug (https://github.com/terraform-providers/terraform-provider-azurerm/issues/8036) which was solved in 2.24.0. Very frustrating.

@camallen Your AFD will be fine now since you corrected the casing that was introduced by the portal lowercase issue. You are safe to edit the Frontdoor via portal as well since they have reverted the changes they made that caused this issue in the first place.

Sadly, after editing our FD resources in the Azure portal we now have the same old error and broken TF

Error: flattening routing_rules: flattening frontend_endpoints: ID was missing the frontendEndpoints element`

Looks like the portal is still changing the resource definition from frontendEndpoints to non expected case for the Azure provider.

E.g. in the https://resources.azure.com/ portal this is what I see (with redactions) for one of the broken FD resource routing rules, note the different casing on different routing rules

{
  "id": "/subscriptions/.../resourcegroups/.../providers/Microsoft.Network/Frontdoors/.../FrontendEndpoints/my-custom-domain-org-azurefd-net"
},
{
  "id": "/subscriptions/.../resourcegroups/.../providers/Microsoft.Network/frontdoors/.../frontendendpoints/my-custom-domain"
}

I can confirm the steps mentioned by @kplantus (thank you đź‘Ť ) do work for us on our existing AFD resources. We do not have any WAF configured.

Tested with v2.36.0 of the resource provider and Terraform v0.13.5

It’s not ideal to edit the resources directly in the azure portal and I’m not sure what will happen if we edit the AFD resources in the portal again, I assume we might re-break the AFD resource definitions.

Hopefully this is useful for the Azure portal team and helps someone else get TF working again.

Thanks for the update @WodansSon I can confirm the steps mentioned by @kplantus does not work with an existing AFD. I also tried deleting the linked WAF policies with the AFD and then running the terraform apply but got the same error. Will wait for the fix.

Hi @naikajah, what version of the provider are you using? I updated my comment above to state that the provider should be at least v2.24.0.

I tested it with v2.36.0

Thanks for the update @WodansSon

I can confirm the steps mentioned by @kplantus does not work with an existing AFD. I also tried deleting the linked WAF policies with the AFD and then running the terraform apply but got the same error. Will wait for the fix.

is the remaining issue perhaps just in frontDoorWebApplicationFirewallPolicies?

As the last 3 reports are mentioning that while the initial report was about healthProbeSettings

After editing in the portal I still have a problem on frontDoorWebApplicationFirewallPolicies (Error: flattening `frontend_endpoint`: ID was missing the `frontDoorWebApplicationFirewallPolicies` element) that is still frontdoorWebApplicationFirewallPolicies in the resource explorer

Same here, still have problem, do we need a new version of the provider to match the change ?

Hi, I think I’m still getting the same issue, is anyone able to confirm this is fixed for them?
We’re in the North Europe region.

When trying to import I get the following:-

Error: Error parsing Resource ID "/subscriptions/0000000-0000-000-0000-000/resourceGroups/NEUR-RG/providers/Microsoft.Network/frontdoors/my-frontdoor": ID was missing the `frontDoors` element

Moved from another state I had already and:-

Error: flattening `frontend_endpoint`: ID was missing the `frontDoorWebApplicationFirewallPolicies` element

@WodansSon would you be able to get an ETA for a fix from the service team here?

@tombuildsstuff I will reach out to the Front Door service team and see how quickly we can get a fix in place for this issue, it will most likely include some cross team collaboration with the portal team to roll back their changes to get this issue totally fixed.

Did you ever hear back from the Microsoft Portal team about this issue? What’s the ETA on a fix from their end?

I tried to run this provider setup to come around it:

provider "azurerm" { 
  version         = "2.15.0"
  features {}
  subscription_id = var.subscription_id
  client_id       = var.client_id
  client_secret   = var.client_secret
  tenant_id       = var.tenant_id
}
provider "azurerm" { 
  alias           = "temp"
  version         = "2.30.0"
  features {}
  subscription_id = var.subscription_id
  client_id       = var.client_id
  client_secret   = var.client_secret
  tenant_id       = var.tenant_id
}

…but it fails with this error:

Error: Failed to query available provider packages

Could not retrieve the list of available versions for provider
hashicorp/azurerm: no available releases match the given constraints 2.15.0,
2.30.0

@GarethOates @Lahiri you should be able to bring in a provider of a different version using provider aliases, but you’ll need to explicitly reference them where appropriate.

See: https://www.terraform.io/docs/configuration/providers.html#alias-multiple-provider-configurations

Yeah I tried that but I think as someone mentioned before it just reverts to the oldest version. It didn’t work for me anyway even though on init it downloaded both provider versions.

Is there anything that can be done once you face this error with a provider greater than 2.24? I’m getting:

Error: Resource instance managed by newer provider version

The current state of module.front_door.azurerm_frontdoor.frontdoor was created
by a newer provider version than is currently selected. Upgrade the azurerm
provider to work with this state.


Error: Resource instance managed by newer provider version

The current state of
module.front_door.azurerm_frontdoor_firewall_policy.policy was created by a
newer provider version than is currently selected. Upgrade the azurerm
provider to work with this state.

Can you delete the backend settings and reset the front door? Can you delete the front door and have it recreated?

Is there a viable workaround on the TF side that we can use until it’s fixed?

I could not simply just downgrade. I had to import the portal-modified resource into a temporary workspace then push that state file to the appropriate place and then downgrade.

@tombuildsstuff , do you have a (GitHub ?) we can use to track this?

Hi @tombuildsstuff - just to clarify - is the only resolution here to wait for Azure to fix the issue? I’m trying to figure out whether it’s worth replacing this resource with another script until it’s resolved.

@tombuildsstuff I dont know if this is what you are asking, but I have noticed this behaviour:

After creating the frontdoor resource with azurerm v2.24 it looks like this in https://resources.azure.com/.

After editing it manually in the portal (whatever change), it looks like this. Notice how many of the ID’s have all become lowercase.

When running terraform plan again now, it fails. You can actually edit the resource in https://resources.azure.com/ manually (edit the template and use PUT), and terraform will start working again (if you managed to correctly adjust all the lowercase ID’s.

Presumably the issue was introduced in https://github.com/terraform-providers/terraform-provider-azurerm/pull/8146 as part of the rewriting around IDs.

Did another bit of testing - realised that we were actually using 2.24, so I’ve also downgraded to 2.23 and it appears to be working again, so looks like the regression is in provider version 2.24.

Seeing the same issue as @eliasgrueninger on the Firewall Policies element, but we’re running 2.21, which implies that it’s not necessarily a change in the most recent version of the provider, but could be a change at the Azure end?