cert-manager: Multi-domain wildcard certificate validation fails when using DNS delegation

Describe the bug:

I have two domains (contoso.com and example.com) and would like to manage DNS01 challenges using single DNS zone (acme.example.com - https://cert-manager.io/docs/configuration/acme/dns01/#delegated-domains-for-dns01), those in example.com and contoso.com zones I have created CNAME records that point to _acme-challenge.acme.example.com (I am interested only in wildcard certificates for those domains, like *.example.com). If i request a certificate for only one CN, for example *.example.com or *.contoso.com everything works fine - _acme-challenge TXT record is created on acme.example.com zone and challenge is solved. However, if I want to request a single certificate which would include both CNs at the same time (*.example.com and *.contoso.com), only one challenge is solved while second one remains in Waiting for DNS-01 challenge propagation: DNS record for "example.com" not yet propagated state.

I am using Route53 DNS and there you cannot have two TXT records with the same name - instead you have one _acme-challenge record which may have multiple values. It seems that there is a race condition during the challenge:

  1. TXT record with first challenge is provisioned
  2. Once first challenge is solved, TXT record is updated by appending value for second challenge
  3. However whole TXT record is immediately deleted not waiting for the second challenge to be solved, instead of removing value for the first challenge and leaving only value for second one.

Expected behaviour:

Certificate should be provisioned and both challenges should be solved. I am expecting that single _acme-challenge TXT record should exist with multiple values assigned. Upon solving one challenge only that one value should be removed, instead of deleting whole TXT record. TXT record should be deleted only of there are no other challenges pending.

Steps to reproduce the bug:

  • All DNS zones are in Route53
  • example.com zone delegates acme challenges to acme.example.com zone
  • contoso.com zone delegates acme challenges to acme.example.com zone
  • ClusterIssuer:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: le-issuer
spec:
  acme:
    email: john.doe@gmail.com
    server: https://acme-v02.api.letsencrypt.org/directory
    privateKeySecretRef:
      name: letsencrypt-account-key
    solvers:
    - dns01:
        cnameStrategy: Follow
        route53:
          region: *******************
          accessKeyID: *******************
          hostedZoneID: *******************
          secretAccessKeySecretRef:
            name: letsencrypt-aws
            key: secret-access-key
      selector:
        dnsNames:
        - '*.example.com'
        - '*.contoso.com'
  • Certificate:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: wildcard
spec:
  dnsNames:
  - '*.example.com'
  - '*.contoso.com'
  issuerRef:
    group: cert-manager.io
    kind: ClusterIssuer
    name: le-issuer
  secretName: wildcard

Anything else we need to know?:

Slack discussion: https://kubernetes.slack.com/archives/C4NV3DWUC/p1605512221087200 Related code: https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/dns/route53/route53.go#L173-L175

Environment details::

  • Kubernetes version: v1.16.13
  • Cloud-provider/provisioner: AWS
  • cert-manager version: 1.0.4
  • Install method: helm

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 11
  • Comments: 16 (5 by maintainers)

Most upvoted comments

I think I might be facing this exact same issue.

However I see minor differences:

  1. My certificate holds two non-wildcard DNS names.
  2. DNS01 challenge is solved successfully when the certificate is first issued.
  3. DNS01 challenge hits the DNS propagation issue only for renewal.
  4. cert-manager is v1.0.3

Another similar report is https://github.com/jetstack/cert-manager/issues/3608

/remove-lifecycle stale

Running into the same problem (when delegating with multiple CNAME records pointing to the same DNS name) with GCP:s CloudDNS (so it seems to affect not only AWS:s Route53).

I would guess it’s the cleanup at https://github.com/jetstack/cert-manager/blob/master/pkg/issuer/acme/dns/clouddns/clouddns.go#L162-L170 that’s the problem, but I’m not very well versed with Go nor the cert-manager code base.