boulder: HTTP-01 IPv6 to IPv4 fallback not working properly

A user in IRC noticed that they were suffering HTTP-01 validation failures for a domain that previously worked. Investigating it appears the domain had an AAAA record and an A record but the AAAA address wasn’t working. I expected the IPv6 to IPv4 fallback code would have masked this issue but looking at the validation records it did not, there is no addressTried, and the addressUsed is the v6 address:

    "validationRecord": [
      {
        "url": "http://xxxxx/.well-known/acme-challenge/XXXXXXXXXXXXX",
        "hostname": "XXXXX",
        "port": "80",
        "addressesResolved": [
          "92.XXX.XXX.XXX",
          "2001:XXXX:XXXX:XXXX::111"
        ],
        "addressUsed": "2001:XXXX:XXXX:XXXX::111",
        "addressesTried": null
      }
    ]

The VA logged:

HTTP request to http://xxxxx/.well-known/acme-challenge/XXXXXXXXXXXXX failed. err=[&url.Error{Op:“Get”, URL:“http://xxxxx/.well-known/acme-challenge/XXXXXXXXXXXXX”, Err:(*http.httpError)(0xc420c89260)}] errStr=[Get http://xxxxx/.well-known/acme-challenge/XXXXXXXXXXXXX: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)]

The root cause is the VA’s HTTP-01 dialer wrapper is re-using the same underlying net.Dialer with an expended timeout between the initial and subsequent fallback connection.

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 6
Comments: 24 (13 by maintainers)

Commits related to this issue

Fix HTTP-01 IPv6 to IPv4 fallback with fresh dialer per conn. (#2852) The implementation of the dialer used by the HTTP01 challenge, constructed with `resolveAndConstructDialer`, used the same wrappe... — committed to letsencrypt/boulder by cpu 7 years ago

Most upvoted comments

Just another “false positive” https://community.letsencrypt.org/t/404-on-well-known-acme-challenge-but-accessable-from-browser/34730

I can understand the decision to prefer AAAA if both records are available for a domain, but sadly we are still living in an ipv4 world. The use case for ipv6 is very limited as majority of domestic ISPs doesn’t provide an ipv6 to their customers. Also, there are a lot of people getting a dedicated, vps and shared hosting and theirs hosters auto conf the DNS providing both ips (ipv4 and ipv6) but people doesn’t care about ipv6 (yet) and don’t configure their services tu use it properly so I’m afraid we will see a lot of cases with this “false positive” issue 😉.

sahsanu on May 23, 2017

Regarding this ipv6 preference https://community.letsencrypt.org/t/certbot-ipv6-address-on-domain-misconfigured-and-challenges-fail-prefer-ipv6/34626

In this case the user have a domain with both records, A and AAAA, but the web server is only configured for ipv4, the ipv6 reachs the web server but not the right virtualhost, in this case, obviously, the challenge fails. I don’t know whether it is worth to fallback to ipv4 in this case.

sahsanu on May 22, 2017

I just ran into this issue as well. The error messages were not descriptive enough in the main client to even clue me in to why I was receiving timeouts, but only on one of my domains (I have several with shared IP addresses). After an entire day of investigating, it became apparent that it was because that was the only domain with dual-stack listed in DNS, and there was some routing issues upstream with IPv6 between Lets Encrypt and my servers. In this particular case, because no TCP connection could even be made, and just “timeout” instead, shouldn’t that quality as a downgrade to IPv4 condition? This is EXACTLY how web browsers handle this exact situation. Sadly, even today in 2018, there are still routing issues with the IPv6 global network at the backbone/BGP level, and because of this, it literally took my production web site offline due to the fact I could not renew certs through LetsEncrypt, and simply got the rate limit (only 5?) when it really seems like a IPv4 fallback should have been preferential.

darkain on Mar 28, 2018

Hi, Does this require a change on the client or is this a server change? I just started hitting this issue myself. My IPv6 service is “broken” so, even though I have both A and AAAA records, only IPv4 will successfully reach my server. I’m getting this timeout when I try to renew. I just upgraded certbot to certbot-0.19.0-1.fc25.noarch but it didn’t seem to fix the problem. If it requires a change to the service, has this change been pushed to the LetsEncrypt service?

derekatkins on Oct 30, 2017

Another thumbs up for this problem.

We do have two domains with IPv6 on port 443 enabled and those update crtificates correctly. Remaining domains are for our use however, not published to clients, so without IPv4 (no need for it, only universities use IPv6 there). When server connects it reaches one of those public domains and fails the check without reverting to IPv4 version of site, where the files is created and accessible - I see letsencrypt record in one of their logs.

I can symlink all challenge dirs into one, but option -ipv4only for certbot would be cooler…

navara on Jul 21, 2018

Hi @derekatkins - the fallback behaviour is a server-side change, and has been deployed to production already.

The catch is that it’s not a complete solution for 100% of all broken IPv6 configurations. In practice there are a handful of cases where IPv6 will not validate for ACME and is broken, but in which the actual IPv6 connectivity works enough to prevent a fallback from occurring. At this point we’ve decided that we can’t invest any more resources in improving the fallback and are not pursuing additional improvements to the server-side code.

My IPv6 service is “broken” so, even though I have both A and AAAA records, only IPv4 will successfully reach my server. I’m getting this timeout when I try to renew.

@derekatkins I recommend that you resolve the IPv6 connectivity or remove the AAAA record entirely. Unfortunately these are the only two options that will be able to fix your problem.

If you need further help diagnosing the problem I recommend starting a new forum topic in the Let’s Encrypt Community Forum. Thanks!

cpu on Oct 30, 2017

@sahsanu - Thanks for commenting. That therad is the same one I mentioned earlier in this thread as a false positive (I should have linked to it, apologies). In this case I don’t expect a fallback and everything appears to be working as intended.

cpu on May 22, 2017

One false-positive for this issue I’ve seen so far is a host with an A and AAAA record failing an HTTP-01 challenge because the webserver on the AAAA IP returned a 404 while the A webserver had the correct webroot configured. This doesn’t meet the conditions for the retry because the failure is at the HTTP challenge validation level and not the IP connectivity level.

cpu on May 22, 2017