cert-manager: Controller fails to process new certs when there are a large number of pending ones
This issue is a duplicate of https://github.com/jetstack/cert-manager/issues/3772. I thought it’d be more appropriate to present this as a bug as my concern is not to be able to adjust the control loop size, but rather to prevent any certificates from getting stuck in the issuing process.
Describe the bug:
In one of my clusters, I have about 60 challenge objects that remain in the pending state due to incorrect DNS records (knowingly so). The cm-acme-http-solver- pods and ingresses are created and left hanging. When the number of such challenge objects grows sufficiently (I don’t have an accurate number), new challenges encounter the following error.
E0315 12:33:57.042284 1 controller.go:158] cert-manager/controller/CertificateKeyManager "msg"="re-queuing item due to error processing" "error"="Operation cannot be fulfilled on certificates.cert-manager.io \"INGRESS_NAME-tls\": the object has been modified; please apply your changes to the latest version and try again" "key"="NAMESPACE/INGRESS_NAME-tls"
The issue is immediately resolved when I remove just a few of the ingresses causing pending challenges.
Expected behaviour: It is expected that certificates, which would normally be issued, do not get stuck due to the sheer number of pending challenges.
Steps to reproduce the bug:
- Create enough
ingressobjects containing hosts with incorrect DNS records so their correspondingchallengeobjects remain pending. - Create a normal
ingress. - Watch it fail to be processed by the control loop.
Environment details::
- Kubernetes version: 1.18
- cert-manager version: 1.1.0
- Install method: helm chart v1.1.0
/kind bug
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 2
- Comments: 27 (7 by maintainers)
I think we could look into expiring challenges after a period of time after which they haven’t succeeded.
Short term solution would be to document this potential pitfall.
@maelvls Any news regarding this issue?
I’ll take a look tomorrow morning.
/assign
Update 10 Feb 2022: I wasn’t able to investigate yet.