cert-manager: requestmanager_controller got stuck in a loop and stopped generating new certificates afterward
Describe the bug: At some point, it seems that the communication between the cert-manager-cainjector and ServerAPI stopped working (we received few EOF logs and subsequently “Successfully Reconciled” logs in the cert-manager-cainjector). However, after the communication restarted, we started receiving:
1 controller.go:158] cert-manager/controller/CertificateRequestManager "msg"="re-queuing item due to error processing" "error"="failed whilst waiting for CertificateRequest to exist - this may indicate an apiserver running slowly. Request will be retried" "key"="default/stan-client-tls"
After another while (like 10s), the controller moved further in the processing of items, but outputted this log for all the previous logs:
I0112 15:48:53.799058 1 requestmanager_controller.go:196] cert-manager/controller/CertificateRequestManager "msg"="Multiple matching CertificateRequest resources exist, delete one of them. This is likely an error and should be reported on the issue tracker!" "key"="default/stan-client-tls"
Afterward, the generation of this certificate stopped altogether.
In the Kubernetes environment, we could see that multiple CertificateRequest objects have been generated for stan-client-tls Certificate with the same revision number. So probably, the client interface (https://github.com/jetstack/cert-manager/blob/cdc53b65cbd344dbef64f0c5c22e6070e79c5b5c/pkg/controller/certificates/requestmanager/requestmanager_controller.go#L339) was fully working and creating new instances, while certificateRequestLister was unable to get proper current state (https://github.com/jetstack/cert-manager/blob/cdc53b65cbd344dbef64f0c5c22e6070e79c5b5c/pkg/controller/certificates/requestmanager/requestmanager_controller.go#L165).
Expected behaviour: The controller should probably delete the unused CertificateRequests objects and continue with creating new ones until one of them succeeds.
Environment details::
- Kubernetes version: 1.17.9
- Cloud-provider/provisioner: Azure
- cert-manager version: 1.0.3
- Install method: Helm
/kind bug
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 3
- Comments: 24 (1 by maintainers)
For the record, for other OVH users having this issue and looking for a workaround/quick fix :
I’m facing the same issue in OVH… Multiple CertificateRequests created and multiple entries like this in cert-manager pod log:
Same issue on OVH cloud provider, as the certmanager controller continues to spawn new CertificateRequest objects, without ever detecting them.
Is there any progress ?
Same issue for me, infinete creation of certificaterequest, one every 30-40 seconds; no orders and no challenges created. Every certificate request has zero events and no status. Some days before I’ve create three certificates with success.
This is a sample log from cert-manager:
cainjector is not involved, no special logs inside.