cert-manager: Wrong SOA record while updating delegated _acme-challenge zone.

Describe the bug:

We have delegated our _acme-challenge domains to a BIND dns server.

Delegation records:

_acme-challenge.kube.example.com IN NS dnsle-01.example.com.
_acme-challenge.kube.example.com IN NS dnsle-02.example.com.

Delegated Zone:

$ORIGIN _acme-challenge.kube.example.com.
$TTL 300

@ IN SOA dnsle-01.example.com. noc.example.com. (
  1605083984 ; serial
  7200 ; refresh
  3600 ; retry
  1209600 ; expire
  3600 ; min
)

@ IN NS dnsle-01.example.com.
@ IN NS dnsle-02.example.com.

Updating via nsupdate or getting a cert using certbot works fine.

tcpdumps:

# cert-manager
85.10.233.254.54466 > 10.200.4.41.domain: [udp sum ok] 171 update [2n] [1au] SOA? example.com. ns: _acme-challenge.kube.example.com. ANY [0s] TXT, _acme-challenge.kube.example.com. [1m] TXT "QZzPbDQhXIfSbMN_InFWeLOYWd5owMLrlwC6gF-mD0A" ar: acme-key. ANY [0s] TSIG hmac-sha512. fudge=300 maclen=64 origid=171 error=0 otherlen=0 (278)

# nsupdate (manual)
85.10.233.254.39730 > 10.200.4.41.domain: [udp sum ok] 41562 update [2n] [1au] SOA? _acme-challenge.kube.example.com. ns: _acme-challenge.kube.example.com. [5m] TXT "test", _acme-challenge.kube.example.com. [5m] TXT "hallo2" ar: acme-key. ANY [0s] TSIG hmac-sha512. fudge=300 maclen=64 origid=41562 error=0 otherlen=0 (203)

Seems like the SOA entry is incorrectly set to example.com? Bind tries then to update the wrong zone, which causes the "error"="DNS update failed: dns: bad authentication" in cert-manager’s logs (I checked the auth using --v=5, it’s correct)’

I already tried setting a nameserver (see below) which did not change the behavior:

--dns01-recursive-nameservers-only
--dns01-recursive-nameservers="1.1.1.1:53"

Sidenote: I changed the domain for privacy reasons.

Expected behaviour:

Setting the TXT record on the correct domain using a correct SOA entry.

Steps to reproduce the bug:

Its possible to recreate this using the upper examples as configuration.

Anything else we need to know?:

For now we cant switch to CNAME based verification (which only works if i use a complete domain, otherwise i get the same issue) since certbot does not support that directly.

Environment details::

  • Kubernetes version: 4.5.0-0.okd-2020-10-15-235428 (v1.18.3)
  • Cloud-provider/provisioner: Bind
  • cert-manager version: 1.0.4
  • Install method: e.g. helm/static manifests: static

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 39 (21 by maintainers)

Most upvoted comments

Hi @maelvls ,

I will start this at beginning of next week and keep you posted 😄


; <<>> DiG 9.16.1-Ubuntu <<>> _acme-challenge.mydomain.com soa
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46114
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;_acme-challenge.mydomain.com.                IN      SOA

;; ANSWER SECTION:
_acme-challenge.mydomain.com. 0       IN      SOA     letsencrypt.mydomain.com. hostmaster._acme-challenge.mydomain.com. 24 10800 3600 604800 3600

;; Query time: 100 msec
;; SERVER: 172.22.0.1#53(172.22.0.1)
;; WHEN: Mon Jun 13 17:06:16 CEST 2022
;; MSG SIZE  rcvd: 149```

Nop, and as previously stated, it works perfectly with certbot, and dig

With a subdelegated setup like this : mydomain.com main DNS zone: (SOA is mydomain.com) _acme-challenge IN NS letsencrypt.mydomain.com

and a _acme-challenge.mydomain.com (SOA is _acme-challenge.mydomain.com as it should be) nameserver. The subdelegated zone on the letsencrypt.mydomain.com dns server: @ IN TXT “mysupertxtrecord”

If i do :

dig _acme-challenge.mydomain.com txt

; <<>> DiG 9.16.1-Ubuntu <<>> _acme-challenge.mydomain.com txt
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49274
;; flags: qr rd ad; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
;; WARNING: recursion requested but not available

;; QUESTION SECTION:
;_acme-challenge.mydomain.com.                IN      TXT

;; ANSWER SECTION:
_acme-challenge.mydomain.com. 0       IN      TXT     "mysupertxtrecord"

;; Query time: 90 msec
;; SERVER: 172.24.144.1#53(172.24.144.1)
;; WHEN: Sat Jan 22 03:13:37 CET 2022
;; MSG SIZE  rcvd: 118

@maelvls i agree with the reasoning, and i would like to see this fix asap too . So i will get back to it, and write a first implementation of the followNS function. Will keep you posted about the progress.

Thanks,

Hi, so i made custom modification to a webhook to check that txt record could be updated. It works pretty well.

However, the DNS propagation check on cert-manager side is failling cause of this SOA search.

If i use dig, the record is properly updated and DNS validation should succeed.

I wonder why the propagation check have to go through recursive query ? Couldn’t we just “trust” standard DNS resolution for this ? A simple net.LookupTXT("_acme-challenge.mydomain.com") give me the right answer.