go: x/crypto/acme: ACME client's internal retry implementation results in hanging retries on 429s
What version of Go are you using (go version)?
$ go version go version go1.14.4 darwin/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env)?
go env Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="/Users/viola/go/bin" GOCACHE="/Users/viola/Library/Caches/go-build" GOENV="/Users/viola/Library/Application Support/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="darwin" GOINSECURE="" GONOPROXY="" GONOSUMDB="" GOOS="darwin" GOPATH="/Users/viola/go" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/usr/local/Cellar/go/1.14.4/libexec" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/local/Cellar/go/1.14.4/libexec/pkg/tool/darwin_amd64" GCCGO="gccgo" AR="ar" CC="clang" CXX="clang++" CGO_ENABLED="1" GOMOD="/Users/viola/crypto/go.mod" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/0s/nyt41p_j69d8vzq1qfkg5nrh0000gn/T/go-build384678003=/tmp/go-build -gno-record-gcc-switches -fno-common"
What did you do?
Start off with triggering a rate-limit, to get a 429 from Let’s Encrypt. Here’s how I triggered one by going over the default 5 per week duplicate certificate limit set by Let’s Encrypt :
- Call
AuthorizeOrderwith a valid AuthzID identifier. - Repeat the request 6 times.
- On the 6th attempt, the request winds up hanging with a much longer execution. It wound up hitting a timeout set up an upstream caller at 45 seconds.
- Unless the context is canceled or timed-out, or a “non-retriable error status” is received, retries are indefinite. In this case, the client never returns, because 429 is a retriable error code, and this will be a somewhat terminal state (for the rest of the week).
- Here’s a look at the 429 that the client was receiving and retrying on in this example.
429 urn:ietf:params:acme:error:rateLimited: Error creating new order :: too many certificates already issued for exact set of domains: violababola.best: see https://letsencrypt.org/docs/rate-limits/
What did you expect to see?
- Fail fast and not to retry on the client side error =>
http.StatusTooManyRequestssince this is a not recoverable response error code. At least, the facility to configure this. - Also, that there is some bound on the number of retries.
What did you see instead?
Retries on the client side error => http.StatusTooManyRequests
It seems like http.StatusTooManyRequests is something that should not be considered as a retriable response error code as it’s not recoverable.
Related PR Fix
- This PR introduces a custom
ShouldRetryfunc option that can be set on the ACME client to allow the default set of retriable response error codes to be overridden. This will keep the current behaviour backwards compatible, but provide more flexible retry configuration.
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 6
- Comments: 15 (4 by maintainers)
I also hit this issue. Could the fix be prioritized? 😃
Hey @viola, sorry it has taken so long to address this. I agree this isn’t the correct behavior, as far as I am aware there is only one class of 429 returned from most ACME servers that is likely to be retry-able in the short term (an overall req/s limit). Rather than expanding the API surface of the client I think it makes sense to just remove the behavior of retrying on 429 responses in general.
I would love to see this fixed since 429 is being returned in a case where you will need to retry for days before it would succeed. That doesn’t seem to match the purpose of 429.
@icholy @alicethorne-ab @ZhiminXiang thanks for letting me know you’ve hit the same issue. This farther validates that our golang/crypto#149 fix would be really nice to bring over to crypto. @FiloSottile @x1ddos @cagedmantis folks any eyes on that PR would be greatly appreciated! Please let me know if there is anything I can do to help. ❤️
cc @FiloSottile if you have some time, would love to pick this one up. Especially since there is more interest now ☝️
Hi all. I’m a developer at @1password and we’re currently experiencing this exact issue with the library. It’d be a great help if we could get golang/crypto#149 moved along and merged before the upcoming code freeze. Please let me know if any testing is needed to assist with that.