cert-manager: DNS01 CNAME support breaks wildcard support for nginx ingress

Describe the bug:

We have a wildcard domain pointing at an nginx ingress controller, which basically means that the wildcard domain resolves to an Elastic Load Balancer.

When trying to create the _acme-challenge record in the wildcarded domain, it sees the CNAME to the ELB and then tries to update the DNS in the ELB’s domain (us-west-2.elb.amazonaws.com).

I0817 02:04:31.075401       1 logger.go:73] Calling GetAuthorization
I0817 02:04:31.203160       1 logger.go:98] Calling DNS01ChallengeRecord
I0817 02:04:31.203193       1 prepare.go:279] Cleaning up old/expired challenges for Certificate staging/staging-phoenix-my-tls
I0817 02:04:31.203206       1 logger.go:68] Calling GetChallenge
I0817 02:04:31.436572       1 wait.go:66] Updating FQDN: _acme-challenge.example.com. with it's CNAME: ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.
I0817 02:04:31.075401       1 logger.go:73] Calling GetAuthorization
I0817 02:04:31.203160       1 logger.go:98] Calling DNS01ChallengeRecord
I0817 02:04:31.203193       1 prepare.go:279] Cleaning up old/expired challenges for Certificate staging/staging-wildcard-tls
I0817 02:04:31.203206       1 logger.go:68] Calling GetChallenge
I0817 02:04:31.436572       1 wait.go:66] Updating FQDN: _acme-challenge.example.com. with it's CNAME: ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.
I0817 02:04:31.436589       1 dns.go:93] Checking DNS propagation for "example.com" using name servers: [100.64.0.10:53]
I0817 02:04:31.472333       1 dns.go:100] DNS record for "example.com" not yet propagated
I0817 02:04:31.472460       1 dns.go:83] Presenting DNS01 challenge for domain "example.com"
I0817 02:04:31.481949       1 wait.go:66] Updating FQDN: _acme-challenge.example.com. with it's CNAME: ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.
I0817 02:04:31.841210       1 helpers.go:201] Found status change for Certificate "staging-wildcard-tls" condition "Ready": "False" -> "False"; setting lastTransitionTime to 2018-08-17 02:04:31.841201695 +0000 UTC m=+9743.366825451
I0817 02:04:31.841235       1 sync.go:276] Error preparing issuer for certificate staging/staging-wildcard-tls: Failed to determine Route 53 hosted zone ID: Zone us-west-2.elb.amazonaws.com. not found in Route 53 for domain ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.
E0817 02:04:31.841254       1 sync.go:197] [staging/staging-wildcard-tls] Error getting certificate 'staging-wildcard-tls': secret "staging-wildcard-tls" not found 
E0817 02:04:31.854121       1 controller.go:180] certificates controller: Re-queuing item "staging/staging-wildcard-tls" due to error processing: Failed to determine Route 53 hosted zone ID: Zone us-west-2.elb.amazonaws.com. not found in Route 53 for domain ab06d0c81742111e8b745062d6efc4d9-1815477658.us-west-2.elb.amazonaws.com.

Expected behaviour: The _acme-challenge TXT record is created in the wildcarded domain (example.com in the above)

Steps to reproduce the bug:

apiVersion: extensions/v1beta1
kind: Ingress
metadata:
  annotations:
    certmanager.k8s.io/acme-challenge-type: dns01
    certmanager.k8s.io/acme-dns01-provider: route53
    certmanager.k8s.io/cluster-issuer: letsencrypt-staging
    kubernetes.io/ingress.class: nginx-external
  name: staging-frontend
spec:
  rules:
  - host: '*.example.com'
    http:
      paths:
      - backend:
          serviceName: staging-frontend
          servicePort: http
  tls:
  - hosts:
    - '*.example.com'
    secretName: staging-wildcard-tls

with a suitable nginx ingress controller pointing at an AWS ELB should do the trick

Anything else we need to know?:

The CNAME behaviour was introduced in #670 and the commit message is sufficient to understand the motivation behind the change, and there’s plenty of support for the change within #670 - as such I don’t know how best to fix this so that my use case is supported without breaking the use case that motivated #670.

cc: @gurvindersingh

Environment details::

  • Kubernetes version (e.g. v1.10.2): 1.9.7
  • Cloud-provider/provisioner (e.g. GKE, kops AWS, etc): AWS
  • cert-manager version (e.g. v0.4.0): quay.io/jetstack/cert-manager-controller:canary
  • Install method (e.g. helm or static manifests): static manifest

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 1
  • Comments: 19 (13 by maintainers)

Most upvoted comments

I was thinking to keep things simple. As the earlier behavior is which people are used to in some setups, so we can have a config option e.g. enable-acme-cname which users decide to enable it if they want CNAME replacement for _acme-challengepart otherwise earlier behavior will be kept same.

If at later stage, people want to have more granular control for different domains different behavior then we can think about adding domain specific CNAME logic.

#670 feels like a major breaking change. I had installed 0.5.0 via helm on a new cluster and was rolling merrily along until somebody added a wildcard entry in DNS for my top-level domain. Then suddenly no certs could be issued. Luckily I noticed before I blew my LE API limit (not sure if that would be an issue, but I didn’t want to find out). It took me forever to figure out what the issue was. Removing the wildcard for now from DNS allowed the certs to be issued.

Just to be clear, I am using the nginx ingress controller for now, but I’m not even using wildcards in my ingresses yet. I feel like people with a good working setup are going to either upgrade or if already on 0.5.0 have someone add a DNS entry that causes their certs not to be re-issued.

Please let me know if I’m way off on this.

I tried v0.5.0 and canary (master-5602) and it didn’t work in either of those. I guess like mentioned above we have to wait for #670 to make it into a release. I’ve reverted to v0.4.1 using the same helm chart without any changes other than the tag and just doing a helm upgrade (fingers crossed i’ve not put myself in a world of hurt longer term) I am no longer seeing the CNAME related cannot find ZoneID for xyzxyz.elb.amazon.com domain error message.

@willthames I think we can use a config option to enable or disable the CNAME support. The default can be disabled to keep the behavior same as earlier. This code can be put under that condition check.