cert-manager: Controller can't handle hitting request rate limits of zerossl ACME API
Describe the bug:
We’ve been using cert-manager with zerossl as ACME provider using http01 challenges for several months now vey successfully.
However, since a couple of weeks ago, zerossl must have changed their ACME API:
They now introduced a quite strict request rate limit.
Whenever issuing a new certificate containing 3 or more domains and using the http01 challenge, we are running in 429 responses from their API, which completely bricks the cert issue flow.
Note: The problem does not occur when issuing a cert containing <=2 domains.
Expected behaviour: The controller should respect 429 responses and try again later. In my case, retrying 2-3 seconds later would already solve the issue.
Steps to reproduce the bug: This is the certificate resource:
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
annotations:
service: tls-cert
labels:
service: tls-cert
name: tls-cert
spec:
dnsNames:
- xxx
- xxx
- xxx
- xxx
- xxx
- xxx
- xxx
- xxx
issuerRef:
group: cert-manager.io
kind: ClusterIssuer
name: zerossl
secretName: tls-cert
usages:
- digital signature
- key encipherment
And this is the ClusterIssuer resource:
apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
name: zerossl
spec:
acme:
externalAccountBinding:
keyID: xxxxx
keySecretRef:
key: eab-hmac-key
name: zerossl
privateKeySecretRef:
name: zerossl-account
server: https://acme.zerossl.com/v2/DV90
solvers:
- http01:
ingress:
class: nginx
After applying the certificate to the cluster, the corresponding CertificateRequest, Order, and Challenge resources are created as expected.
However, during processing of the challenges, the ACME client hits the request limit of the zerossl API:

# failed challenge status:
status:
presented: false
processing: false
reason: 'Failed to retrieve Order resource: 429 : 429 Too Many Requests'
state: errored
Once the first challenge fails, the error state is propagated to the Order and Certificate resource:
# Order status:
status:
authorizations:
....
failureTime: "2023-03-16T10:26:15Z"
finalizeURL: https://acme.zerossl.com/v2/DV90/order/xxxxx/finalize
reason: "Failed to retrieve Order resource: 429 : <html>\r\n<head><title>429 Too
Many Requests</title></head>\r\n<body>\r\n<center><h1>429 Too Many Requests</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n"
state: errored
url: https://acme.zerossl.com/v2/DV90/order/xxxxx
# Certificate status:
status:
conditions:
- lastTransitionTime: "2023-03-16T10:26:08Z"
message: Issuing certificate as Secret does not exist
observedGeneration: 1
reason: DoesNotExist
status: "False"
type: Ready
- lastTransitionTime: "2023-03-16T10:26:15Z"
message: "The certificate request has failed to complete and will be retried:
Failed to wait for order resource \"tls-cert-twhmq-1698200363\" to become ready:
order is in \"errored\" state: Failed to retrieve Order resource: 429 : <html>\r\n<head><title>429
Too Many Requests</title></head>\r\n<body>\r\n<center><h1>429 Too Many Requests</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n"
observedGeneration: 1
reason: Failed
status: "False"
type: Issuing
failedIssuanceAttempts: 1
lastFailureTime: "2023-03-16T10:26:15Z"
Anything else we need to know?:
It seems that for every challenge, the order is retrieved from the acme API. The more domains in the certificate, the more challenges are being spawned, and thus the more requests to fetch the order object are being made.
I see two technical issues here:
- upon retrieval of a 429 response code, the controller should retry instead of giving up immediately
- in order to ease the pressure on the ACME API, the order response should be cached
I have informed the technical support of zerossl about this issue. Their suggestion was to throttle the requests and/or implement a retry.
Environment details::
- Kubernetes version: 1.24
- Cloud-provider/provisioner: GKE
- cert-manager version: 1.11.0
- Install method: helm 1.11.0 /kind bug
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 15
- Comments: 42 (1 by maintainers)
Commits related to this issue
- Draft of tutorial for Google's Public CA Solicited here: https://github.com/cert-manager/cert-manager/issues/5867#issuecomment-1506847108 — committed to axisofentropy/cert-manager-website by axisofentropy a year ago
- Draft of tutorial for Google's Public CA Solicited here: https://github.com/cert-manager/cert-manager/issues/5867#issuecomment-1506847108 Signed-off-by: Adam Vollrath <adam.d.vollrath@gmail.com> — committed to axisofentropy/cert-manager-website by axisofentropy a year ago
- Draft of tutorial for Google's Public CA Solicited here: https://github.com/cert-manager/cert-manager/issues/5867#issuecomment-1506847108 Signed-off-by: Adam Vollrath <adam.d.vollrath@gmail.com> — committed to axisofentropy/cert-manager-website by axisofentropy a year ago
- * Updated the gateway-api crds to the latest version. (see code for source urls) Found that ZeroSSL now has some stricter rate limits in place; in addition to the issue hit earlier, we are now also h... — committed to debate-map/app by Venryx a year ago
@baszalmstra, here is an example of how it worked for me:
Update from zerossl support: They are looking deeper into this. It might be a technical issue on their side, after all.
I’ve just published an article showing how we set up
cert-managerto use Google’s Public CA. https://www.uffizzi.com/blog/ditching-zerossl-for-google-public-certificate-authority-for-ssl-certificates-via-cert-manager-and-acme-protocolcc @baszalmstra
https://github.com/cert-manager/cert-manager/pull/5901 should also hopefully help.
I’ve been writing back and forth with the zerossl support. Unfortunately it looks like they are not interested in understanding the problem nor helping me with the issue.
They keep telling me take a look at their documentation. The only related information given is not specific, so it doesn’t help at all:
I’ve been specifically asking for more information about rate limiting.
The gist of their answer, from oldest to newest:
At this point, I have given up on the zerossl support. I don’t think they will fix the issue on their side. It’s a shame though: zerossl is otherwise a perfect match for cert-manager. I wanted it to be the go-to provider whenever the cert rate limits of letsencrypt don’t suffice. In case anyone knows a viable alternative to zerossl, please let me know.
I’m wondering whether using DNS challenges instead of HTTP-challenges would help. Does anyone know if using DNS challenges would send less requests to the order endpoint?
I suspect that zerossl using overall rate limiter, because sometimes I observed
429 Too Many Requestseven when there are only 1-2 certificate requests. If true then it’s not fair for everyone because some can request a lot of certificates while other are unable to request anyJust adding a datapoint, We migrated away from ZeroSSL for this very reason. We’re in google cloud, so we’re now using the GCP Public CA with the ACME issuer & have had no problems since: https://cloud.google.com/certificate-manager/docs/public-ca
Thank you for looking into the issue.
I didn’t know the
--max-concurrent-challengesflag existed. This sounds like it could really help.I have deployed cert-manager with
--max-concurrent-challenges=1. The first 6 challenges succeed. However, the last two fail due to 429.At least this time, the order is not in state
failedbut still inpending. This makes it easier to ‘reset’ the challenges. I’ve written a workaround script which set’s the status of the challenges back topending. cert-manager then picks up processing the challenges again. In case someone has use for this:We might be able to cut down the number of
HTTP01ChallengeResponsecalls to be the same as the number of required authorizations (basically the number of DNS names) for success path. I’ll give that a go, that’s not guaranteed to make it work with ZeroSSL though.Thank you for your response. I am well aware of the letsencrypt rate limits. They are unsuitable for our use case, hence we moved to zerossl. In their basic plan (which we use), the amount of certificates are not limited in any way.
I have reached out to the zerossl support, they have confirmed that a) our account is by no means limited in terms of certificates b) they have implemented general rate limiting on their API.
Therefore I am 100% sure the error message is not in regards to the amount of certificates in general, but in regards to the amount of requests being sent per second. FYI, as mentioned earlier I have no problem issuing new certs with <=2 domains, even directly after hitting the rate limit with a certificate which has >2 domains.
I have played around with the failed resources and manually resetted their
.status.statefield topending. Directly afterwards the challenge was conducted successfully.I have invested some time to build a workaround using shell-operator to reset the
.status.statefields automatically whenever it has errored and the message contained429. While this most of the time works, monkey-patching the cert-manager resources seems to be a bad practice. I failed to get it working smoothly since obviously cert-manager should be the only controller altering the resource state. However, I derive from this experiment that my assumption is right: If cert-manager would simply retry fetching the order resource, or issue the requests in a slightly staggered fashion, or use a cached response, the problem would be solved. The new request rate limit in the zerossl api seems to be set up in a way which blocks request spikes to the same resource over a short period of times (i.e., couple of seconds).