terraform-provider-azurerm: Inconsistent behaviour when specifying `subscription_id` attribute in provider block for new Subscriptions

Is there an existing issue for this?

  • I have searched the existing issues

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave “+1” or “me too” comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Terraform Version

1.2.7

AzureRM Provider Version

3.18.0

Affected Resource(s)/Data Source(s)

azurerm_subscription

Terraform Configuration Files

# Base provider, used to read the billing account and create the new Subscription
provider "azurerm" {
  features {}
}

# Data resource to get the billing_scope value
data "azurerm_billing_enrollment_account_scope" "example" {
  billing_account_name    = local.billing_account_name
  enrollment_account_name = local.enrollment_account_name
}

# Create the new Subscription (uses the Microsoft.Subscription/Aliases provider)
resource "azurerm_subscription" "example" {
  subscription_name = "My Test Subscription"
  alias             = "new-sub-test-001"
  billing_scope_id  = data.azurerm_billing_enrollment_account_scope.example.id
}

# Define a secondary provider with an alias, configured to target the newly created Subscription
provider "azurerm" {
  alias           = "new_sub"
  subscription_id = azurerm_subscription.example.subscription_id
  features {}
}

# Create a Resource Group in the newly created Subscription
resource "azurerm_resource_group" "example" {
  provider = azurerm.new_sub
  name     = "new-sub-test"
  location = "West Europe"
}

Debug Output/Panic Output

# see below

Expected Behaviour

When trying to create a new Azure Subscription and then cross-referencing the Subscription ID of the new Subscription into a provider block we expect to see consistent behaviour regardless of the authentication method used.

Actual Behaviour

When authenticating with a Service Principal the initial terraform plan throws the following error:

â•·
│ Error: building AzureRM Client: 1 error occurred:
│       * A Subscription ID must be configured when authenticating as a Service Principal using a Client Secret.
│
│
│
│   with provider["registry.terraform.io/hashicorp/azurerm"].new_sub,
│   on main.tf line 20, in provider "azurerm":
│   20: provider "azurerm" {
│
╵

We believe this to be correct, as at this point the new Subscription ID is unknown and therefore cannot be correctly set on the provider.

When we authenticate using the Azure CLI the initial terraform plan is successful, but we get the following error during terraform apply:

â•·
│ Error: building AzureRM Client: obtain subscription(00e960fd-76a5-410b-9391-877383a5974f) from Azure CLI: parsing json result from the Azure CLI: waiting for the Azure CLI: exit status 1: ERROR: Subscription '00e960fd-76a5-410b-9391-877383a5974f' not found. Check the spelling and casing and try again.
│ 
│   with provider["registry.terraform.io/hashicorp/azurerm"].new_sub,
│   on main.tf line 20, in provider "azurerm":
│   20: provider "azurerm" {
│
╵

We believe in this case that the provider is incorrectly accepting a null or empty string input on the subscription_id attribute of the provider block and instead defaulting to the current default Subscription associated with the Azure CLI authenticated user.

This is incorrectly giving customers the illusion that they can create a new Subscription and then deploy resources straight into it within a single apply of a single root module.

Steps to Reproduce

  1. Create a configuration matching the example, providing valid values for access to the billing account for Subscription create
  2. Authenticate using a Service Principal with suitable permissions
  3. terraform init (should work)
  4. terraform plan (should throw error)
  5. Authenticate using Azure CLI with suitable permissions
  6. terraform init (should work)
  7. terraform plan (should work)
  8. terraform apply (should create new Subscription and then throw error) image
  9. terraform apply (should throw the following error) image
  10. az account list --refresh (to refresh the list of available Subscriptions)
  11. May need to run az account set to select a subscription
  12. terraform apply (should work the second time as the Subscription now exists!) image

Important Factoids

n/a

References

I believe this issue identifies the underlying reason why customers are trying to create a new Subscription and then deploy resources to it, only to face issues which surface as issues such as:

We have successfully worked around this situation with our lz-vending module, but this was only made possible as the new AzAPI provider has an option to set the parent_id for any resource being created,

We believe the addition of a new optional parent_id argument for all resources in the azurerm provider would allow customers to deploy resources across multiple Subscriptions without hitting this issue (or the need to provide multiple aliased providers).

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 5
  • Comments: 17 (9 by maintainers)

Most upvoted comments

@manicminer yeah agreed, I think we should probably look to make subscription_id required in 4.0 to alleviate the confusion.

There’s been a number of issues where folks have been confused about this behaviour (by having resources provisioned in the wrong subscription or otherwise) - and whilst this behaviour made sense within CloudShell (for learning purposes) - it doesn’t really make sense outside of that environment, so I think it’s worth making this field required going forward.

@manicminer I think you’ve made it clear, I will happily admit I don’t understand all the interactions you describe - they sound like technical debt, and perhaps you could consider this something to address in some future major version bump. It would be fantastic to bring the azure provider up to parity with e.g. the google provider, or the kubernetes provider, in this regard.

Re this bit though:

It’s also not practical to make the assumption that every resource needs creating and proceed anyway, because that might not be true in the event the subscription does exist but isn’t in state. If that’s the case for you, then the subscription is known to the practitioner and should be imported, or the subscription ID should be set in the configuration (making the problem irrelevant).

This is an XY problem I think: out of state subscriptions should be imported; there are existing facilities (now much better since 1.5) for doing declarative imports, and indeed an imported subscription would not show as having an Unknown ID, so should plan and operate entirely fine today. It is only the wholly new case that any putative change would need to solve.

@manicminer, Thank you for your response. This might not be recommended, and yes, we can split the deployment into separate deployments even if that would complicate our use case. Having a single deployment is really something we would like to be supported, a single deployment make sense, and will ease deployments and our use case! Hopefully this is functionality that will have greater support in coming versions, and not less.

As stated above, the deployment runs successfully even with service principal authentication, as long as you run without a remote backend. If I run with local backend, the plan and apply stage runs successfully. If I specify a remote backend, it fails.

I have done some additional testing, it seems that the plan is not checking against the current/default subscription, there is no signs for this in the output (or in the trace log):

terraform plan
data.azurerm_billing_mca_account_scope.this: Reading...
data.azurerm_billing_mca_account_scope.this: Read complete after 0s [id=/providers/Microsoft.Billing/billingAccounts/bxxx/billingProfiles/xxx/invoiceSections/xxx]

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # azurerm_resource_group.rg will be created
  + resource "azurerm_resource_group" "rg" {
      + id       = (known after apply)
      + location = "norwayeast"
      + name     = "test"
    }

  # azurerm_subscription.this will be created
  + resource "azurerm_subscription" "this" {
      + alias             = "tf-test-1"
      + billing_scope_id  = "/providers/Microsoft.Billing/billingAccounts/xxx/billingProfiles/xxx/invoiceSections/xxx"
      + id                = (known after apply)
      + subscription_id   = (known after apply)
      + subscription_name = "tf-test-1"
      + tenant_id         = (known after apply)
    }

  # null_resource.refresh_subs will be created
  + resource "null_resource" "refresh_subs" {
      + id = (known after apply)
    }

  # time_sleep.sub_create will be created
  + resource "time_sleep" "sub_create" {
      + create_duration = "10s"
      + id              = (known after apply)
    }

Plan: 4 to add, 0 to change, 0 to destroy.

Just as an additional verification, I have tried to add the resource group (test) into the default subscription just to see if that changed this behaviour, it did not – this is as expected since I can’t see any trace towards the default subscription that terraform is checking if that resource exists.

So, based on this testing, it seems like the verification of the resource group is not happening towards the default subscription. The issue only occurs if I specify a remote backend in combination with service principal authentication, so I assume there must be an issue related to this combination.

For your reference, I’m using this code to illustrate the issue:


terraform {
  required_version = ">=1.3.0"
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=3.46.0"
    }
    time = {
      source  = "hashicorp/time"
      version = "0.9.1"
    }
  }
  # If I add a backend config (unncomment the below), and specify credentials for the backend, the plan and apply will fail
  # backend config:
  # backend "azurerm" {
  #   resource_group_name  = ""  # Fill in required info
  #   storage_account_name = ""  # Fill in required info
  #   container_name       = ""  # Fill in required info
  #   key                  = ""  # Fill in required info
  #   subscription_id      = ""  # Fill in required info
  #   tenant_id            = ""  # Fill in required info
  # }

  #Set these environment variables to enable authentication to Azure backend using service principal 
  # export ARM_SUBSCRIPTION_ID=""
  # export ARM_CLIENT_SECRET=""
  # export ARM_TENANT_ID=""
  # export ARM_CLIENT_ID=""
}
provider "azurerm" {
  alias           = "newsub"
  subscription_id = azurerm_subscription.this.subscription_id
  features {
  }
}

provider "azurerm" {
  features {
  }
}

variable "billing_account_name" {
  description = "Billing Account Name for MCA account (guid)"
  type        = string
}
variable "billing_profile_name" {
  description = "Billing Profile ID"
  type        = string
}
variable "invoice_section_name" {
  description = "Invoce Section ID"
  type        = string
}
variable "subscription_name" {
  description = "Name of the new subscriptoin to create"
  type        = string
}


data "azurerm_billing_mca_account_scope" "this" {
  billing_account_name = var.billing_account_name
  billing_profile_name = var.billing_profile_name
  invoice_section_name = var.invoice_section_name
}

resource "azurerm_subscription" "this" {
  subscription_name = var.subscription_name
  billing_scope_id  = data.azurerm_billing_mca_account_scope.this.id
  alias             = var.subscription_name
}

resource "azurerm_resource_group" "rg" {
  depends_on = [
    null_resource.refresh_subs
  ]
  provider = azurerm.newsub
  name     = "test"
  location = "norwayeast"
}

resource "time_sleep" "sub_create" {
  depends_on = [
    azurerm_subscription.this
  ]
  create_duration = "10s"
}

resource "null_resource" "refresh_subs" {
  depends_on = [
    time_sleep.sub_create
  ]
  provisioner "local-exec" {
    command = "az account list --refresh"
  }
}

Thanks for raising this @krowlandson and thanks both for the nuanced conversation. I think there’s merit in the idea of requiring a subscription_id to always be set regardless of the authentication method. If this were to be considered, it would naturally be considered a breaking change and could only reasonably be enforced in the next major version of the provider.

That aside, I’m not sure we can reliably compare between an unset subscription ID and a blank subscription ID - mainly because we also source this from an environment variable and so the unset value is nearly always an empty string AFAICT.

As mentioned above, given an empty subscription_id the provider will hit error when authenticated using SP during the plan stage, but not when using the CLI auth (as it will use the default subscription in this case). Ideally, either case should work in @krowlandson’s config, as the second provider is referencing a known-after-apply attribute, which Terraform is able to handle it correctly:

provider "azurerm" {
  alias           = "new_sub"
  subscription_id = azurerm_subscription.example.subscription_id
  features {}
}

The behavior about how provider behaves when referencing such attribute is explained here (though this is for the plugin framework, but it should also apply to the plugin sdk v2). As is stated in the document:

You can choose how your provider handles this. If some resources or data sources can be used without knowing that value, it may be worthwhile to emit a warning and check whether the value is set in resources and data sources before attempting to use it. If resources and data sources can’t provide any functionality without knowing that value, it’s often better to return an error, which will halt the apply.

For this provider, IMO it should behave correctly without a subscription id during the plan stage: For existing resources, the subscription id can already be deduced by the resource id; For new resources, there is nothing to be done during plan stage.

Unfortunately, it is a non-trivial work to make the unknown value work for the azurerm provider at this moment. That is because:

  1. The provider is currently heavily based on the Azure SDK track1. In this SDK, each client has its subscription defined (example). The provider will initialize all the clients (each one corresponds to a certain resource type) at the configuration phase, then each TF resource will reference the corresponding client in their CRUD functions. So each resource type client plays as a singleton in the provider wide, this brings a constraint that these singleton client should be stable regarding the embeded subscription id.
  2. The provider configuration phase requires to make use of some certain clients. E.g. it will use the resource client to register the RPs. This requires the subscription id has to be known at plan stage.

For the point 1, I think after we totally migrate to the hashicorp/go-azure-sdk, it should be resolved. For the point 2, I think we can further balance whether we want the dynamic provider config more than those additional features during provider configuration phase.