terraform-provider-acme: Bug: DNS provider configuration not passed correctly to lego when renewing multiple certs in parallell

Hi,

We are experiencing an issue which, after some thorough digging, we believe to be specifically related to how this provider interacts with lego;

We experience that when we renew more than one cert in one run/parallel one of the DNS challenge configurations “wins” and is used for all certs being renewed. This results in renewal of all but one cert failing.

We are using the Azure DNS provider but as far as I can understand this should happen for all projects that have multiple certs renewed at once, where the lego DNS provider configuration differs between the certs.

In our specific case it is the AZURE_ZONE_NAME configuration that differs between certs. An example of how this fails is our last run in our DEV environment (logs from this run further down):

  • Terraform logged that both saksbehandler.dev.eksplosiver.no and dev.farligeprodukter.no was about to be renewed.
  • saksbehandler.dev.eksplosiver.no completed in a little more than 23 seconds.
  • dev.farligeprodukter.no timed out with an error after a little more than 4 minutes.
  • The Azure audit log show that two TXT records were written:
    • saksbehandler.dev.eksplosiver.no/TXT/_acme-challenge
    • saksbehandler.dev.eksplosiver.no/TXT/_acme-challenge.dev.farligeprodukter.no
      • Notice the mixup between domains here 🔼
      • In the log at the bottom you can observe that the TXT record lego is correctly polling for is _acme-challenge.dev.farligeprodukter.no and not _acme-challenge.dev.farligeprodukter.no.saksbehandler.dev.eksplosiver.no

Normally one would suspect that this behavior is due to some bug in the DNS provider, but after reviewing debug-log and reading up on how to use their CLI (https://go-acme.github.io/lego/usage/cli/), I’m leaning towards this issue being caused by how lego is invoked from the Terraform ACME Provider. I was about to attempt to reproduce our issue with the CLI when it occurred to me that it would be impossible to have two values in an environment variable at once. The lego CLI supports multiple instances of --domains value on the command line but DNS provider configuration is expected to be defined in environment variables. Thus the only way of running two (or more) renewals in parallel from the command line would be to spawn two (or more) individual lego processes each with their own set of DNS provider configuration.

Based on the above my suspicion is that lego is not invoked with “enough separation” by the Terraform ACME Provider. I haven’t looked into exactly how the lego library is invoked but I wouldn’t be surprised if there is some set of data structures or environment variables that are shared between the two lego processes…

I’ve enabled debug logging from provider and terraform locally and manually verified that the dns_challenge actually differs between our various certs.

Please let me know if there is any more information I can supply to help you debug this issue 😉

Our terraform cert resource loop:

resource "acme_certificate" "certificate" {
  for_each                  = local.dns_zones_with_gateway_config 
  account_key_pem           = tls_private_key.private_key.private_key_pem
  common_name               = each.key
  subject_alternative_names = ["*.${each.key}"]
  key_type                  = "4096" # RSA, 4096 bits
  min_days_remaining        = 60

  dns_challenge {
    provider = "azure"
    config = {
      AZURE_RESOURCE_GROUP = azurerm_resource_group.common_network_rg.name
      AZURE_CLIENT_ID     = data.azuread_service_principal.terraform_user.application_id
      AZURE_CLIENT_SECRET = data.azurerm_key_vault_secret.terraform_user_password.value
      AZURE_SUBSCRIPTION_ID = data.azurerm_client_config.current.subscription_id
      AZURE_TENANT_ID       = data.azurerm_client_config.current.tenant_id

      AZURE_ZONE_NAME = each.key
    }
  }
}

Relevant terraform output from failure in DEV environment:

module.common_network.azurerm_key_vault_certificate.key_vault_web_certificate["dev.farligeprodukter.no"]: Destroying... [id=https://REDACTED.vault.azure.net/certificates/cert-dev-farligeprodukter-no-9f0fc806-REDACTED/REDACTED]
module.common_network.azurerm_key_vault_certificate.key_vault_web_certificate["saksbehandler.dev.eksplosiver.no"]: Destroying... [id=https://REDACTED.vault.azure.net/certificates/cert-saksbehandler-dev-eksplosiver-no-62672dce-REDACTED/REDACTED]
module.common_network.azurerm_key_vault_certificate.key_vault_web_certificate["dev.farligeprodukter.no"]: Still destroying... [id=https://REDACTED.vault.a...82680/REDACTED, 10s elapsed]
module.common_network.azurerm_key_vault_certificate.key_vault_web_certificate["saksbehandler.dev.eksplosiver.no"]: Still destroying... [id=https://REDACTED.vault.a...75efb/REDACTED, 10s elapsed]
module.common_network.azurerm_key_vault_certificate.key_vault_web_certificate["dev.farligeprodukter.no"]: Still destroying... [id=https://REDACTED.vault.a...82680/REDACTED, 20s elapsed]
module.common_network.azurerm_key_vault_certificate.key_vault_web_certificate["saksbehandler.dev.eksplosiver.no"]: Still destroying... [id=https://REDACTED.vault.a...75efb/REDACTED, 20s elapsed]
module.common_network.azurerm_key_vault_certificate.key_vault_web_certificate["dev.farligeprodukter.no"]: Destruction complete after 22s
module.common_network.azurerm_key_vault_certificate.key_vault_web_certificate["saksbehandler.dev.eksplosiver.no"]: Destruction complete after 22s
module.common_network.acme_certificate.certificate["saksbehandler.dev.eksplosiver.no"]: Modifying... [id=62672dce-REDACTED]
module.common_network.acme_certificate.certificate["dev.farligeprodukter.no"]: Modifying... [id=9f0fc806-REDACTED]
module.common_network.acme_certificate.certificate["saksbehandler.dev.eksplosiver.no"]: Still modifying... [id=62672dce-REDACTED, 10s elapsed]
module.common_network.acme_certificate.certificate["dev.farligeprodukter.no"]: Still modifying... [id=9f0fc806-REDACTED, 10s elapsed]
module.common_network.acme_certificate.certificate["dev.farligeprodukter.no"]: Still modifying... [id=9f0fc806-REDACTED, 20s elapsed]
module.common_network.acme_certificate.certificate["saksbehandler.dev.eksplosiver.no"]: Still modifying... [id=62672dce-REDACTED, 20s elapsed]
module.common_network.acme_certificate.certificate["saksbehandler.dev.eksplosiver.no"]: Modifications complete after 23s [id=62672dce-REDACTED]
module.common_network.acme_certificate.certificate["dev.farligeprodukter.no"]: Still modifying... [id=9f0fc806-REDACTED, 30s elapsed]
...
module.common_network.acme_certificate.certificate["dev.farligeprodukter.no"]: Still modifying... [id=9f0fc806-REDACTED, 4m10s elapsed]
╷
│ Error: error: one or more domains had a problem:
│ [*.dev.farligeprodukter.no] time limit exceeded: last error: NS ns4-04.azure-dns.info. did not return the expected TXT record [fqdn: _acme-challenge.dev.farligeprodukter.no., value: REDACTED]: 
│ [dev.farligeprodukter.no] time limit exceeded: last error: NS ns4-04.azure-dns.info. did not return the expected TXT record [fqdn: _acme-challenge.dev.farligeprodukter.no., value: REDACTED]: 
│ 
│ 
│   with module.common_network.acme_certificate.certificate["dev.farligeprodukter.no"],
│   on common/common-network/certificates.tf line 21, in resource "acme_certificate" "certificate":
│   21: resource "acme_certificate" "certificate" {
│ 
╵
Error: Process completed with exit code 1.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 17 (9 by maintainers)

Commits related to this issue

Most upvoted comments

@plemelin thanks for the data, knowing that more and more people are having to apply the workaround is giving me more ammo to work on addressing it. I think now that most of the smaller issues are worked out (save imports) I will shift my attention to it. Again it’s a bit of a big lift as we’re talking about making fundamental changes to lego, but let’s see what happens.