lego: Possible bad propagation check with dns-01 challenge
Welcome
- Yes, I’m using a binary release within 2 latest releases.
- Yes, I’ve searched similar issues on GitHub and didn’t find any.
- Yes, I’ve included all information below (version, config, etc).
What did you expect to see?
A sucessful certificate generation.
What did you see instead?
We had an error message from Let’s Encrypt:
2022/11/30 10:24:59 error: one or more domains had a problem: [cloud.syseleven.de] acme: error: 400 :: urn:ietf:params:acme:error:dns :: DNS problem: SERVFAIL looking up TXT for _acme-challenge.cloud.syseleven.de - the domain's nameservers may be malfunctioning",
but the propagation check with 2 name servers apparently passed:
2022/11/30 10:23:40 [INFO] [cloud.syseleven.de] acme: Could not find solver for: tls-alpn-01
2022/11/30 10:23:40 [INFO] [cloud.syseleven.de] acme: Could not find solver for: http-01
2022/11/30 10:23:40 [INFO] [cloud.syseleven.de] acme: use dns-01 solver
2022/11/30 10:23:40 [INFO] [cloud.syseleven.de] acme: Preparing to solve DNS-01
2022/11/30 10:23:50 [INFO] [cloud.syseleven.de] acme: Trying to solve DNS-01
2022/11/30 10:24:00 [INFO] [cloud.syseleven.de] acme: Checking DNS record propagation using [8.8.8.8:53 4.4.4.4:53]
2022/11/30 10:24:10 [INFO] Wait for propagation [timeout: 10m0s, interval: 10s]
2022/11/30 10:24:10 [INFO] [cloud.syseleven.de] acme: Waiting for DNS record propagation.
2022/11/30 10:24:20 [INFO] [cloud.syseleven.de] acme: Waiting for DNS record propagation.
2022/11/30 10:24:30 [INFO] [cloud.syseleven.de] acme: Waiting for DNS record propagation.
2022/11/30 10:24:47 [INFO] [cloud.syseleven.de] acme: Cleaning DNS-01 challenge
With a successful result, before the last line, there would have been a message like
[cloud.syseleven.de] The server validated our request.
Note that we used 4.4.4.4 which was a working public name server in the past, but apparently no longer. It must have stopped working relatively recently. We discovered this while trying to debug this. As of now, it does not respond to any query.
However there was no indication from lego that there was a problem and it looks like it accepted the broken server as working, and continued on, as if everything was working.
Even when we replaced 4.4.4.4 with another server, the next attempt failed in the same way.
This makes me think that the propagation check doesn’t really work. How else could a random nameserver serve the correct TXT record (I surely hope that this is part of the check, right?) but when Let’s Encrypt does the query it fails. I noticed that you get the SERVFAIL error also if the TXT record is simply missing. It seems extremely unlikely that the name servers worked long enough for a query via 8.8.8.8 to work, and then suddenly broke when Let’s Encrypt
How do you use lego?
Docker image
Reproduction steps
We use a gitlab CI pipeline to run this command periodically:
lego --accept-tos --dns, designate --path /tmp/lego --dns.resolvers 8.8.8.8 --dns.resolvers", 4.4.4.4 --server=https://acme-v02.api.letsencrypt.org/directory --email noreply@syseleven.de --key-type rsa4096 -d "*.cloud.syseleven.net" -d "*.infra.sys11cloud.net" -d "*.infrabk.sys11cloud.net" -d "*.infrabl.sys11cloud.net -d "*.infrafe.sys11cloud.net" -d "cloud.syseleven.de" renew --preferred-chain "ISRG Root X1"
Version of lego
Our docker image is based on
`FROM goacme/lego:v4.9.1`
Logs
See above
Go environment (if applicable)
No response
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 1
- Comments: 18 (8 by maintainers)
@oseiberts11, this should help you debugging:
Dockerfile
BuildKit required!
The patch can be found here: https://gist.github.com/dmke/f2d31407cc17d7801a0f32ebbe6cd283.
To build a drop-in-replacement for the
goacme/lego:v4.9.1image, copy the Dockerfile on your system and run:You probably don’t want to distribute the image, as it skips the cleanup procedure entirely.
I also started encountering this behavior, out of nowhere, after a year+ of certs renewing automatically without any issue.
I’m using the Docker image, running the command:
Logs:
I have solid IPv4 and IPv6 connectivity. I’ll wait a couple hours to make sure I don’t run into cached NXDOMAIN and set
DESIGNATE_POLLING_INTERVAL=60, to see what happens. My certs expire in 99 hours, hopefully it’ll work again before that. Not thrilled about the situation.Thanks for the offer. I think the Dockerfile would work fine.