cert-manager: Webhook errors with cert-manager v1.10.0 on OpenShift Container Platform (OCP)
Describe the bug:
After installing 1.10 on OCP no certificates can be created as the following webhook error is produced
Error from server (InternalError): error when creating "cert.yml": Internal error occurred: failed calling webhook "mutate.webhooks.cert-manager.io": failed to call webhook: Post "[https://cert-manager-webhook-service.openshift-operators.svc:443/mutate?timeout=10s](https://cert-manager-webhook-service.openshift-operators.svc/mutate?timeout=10s)": x509: certificate is valid for cert-manager-webhook, cert-manager-webhook.openshift-operators, cert-manager-webhook.openshift-operators.svc, not cert-manager-webhook-service.openshift-operators.svc
Expected behaviour:
Create to work
Steps to reproduce the bug: Install Cert-Manager 1.10 Create cert
kind: Certificate
apiVersion: cert-manager.io/v1
metadata:
name: example-certificate
namespace: openshift-operators
spec:
dnsNames:
- example.com
issuerRef:
name: example-issuer
secretName: example-certificate-tls
Anything else we need to know?:
The upgrade seems to have created a new service:
cert-manager ClusterIP 172.30.243.52 <none> 9402/TCP 237d
cert-manager-webhook ClusterIP 172.30.14.57 <none> 443/TCP 237d
cert-manager-webhook-service ClusterIP 172.30.108.134 <none> 443/TCP 131m
and a new service cert, issued by Openshift:
cert-manager-webhook-service-cert kubernetes.io/tls 3 3h46m
So the webhook seems to be using the new service, but the pod is still using the old certificate, issued by cert-manager-webhook-ca, hence the error.
Environment details::
- Kubernetes version: 1.23
- Cloud-provider/provisioner: Open Shift 4.10
- cert-manager version: 4.10
- Install method: OperatorHub
/kind bug
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (8 by maintainers)
The patched cert-manager OperatorHub packages are now on operatorhub.io and on OpenShift OperatorHub community catalog, with package version v1.10.1. The underlying cert-manager remains v1.10.0.
I have tested on Kubernetes 1.25 with operatorhub.io and on OpenShift v4.11 and confirmed that the K8S API server can connect to the cert-manager webhook.
Apologies to everyone who was affected by this bug.
For Kubernetes Users
For people using packages from operatorhub.io on Kubernetes, I’ve published updated packages containing the fix for this problem in the release channel called “candidate”:
I’ve tested the installation on a Kind Kubernetes 1.25 cluster. If there are no objections I will publish v1.10.1 tomorrow in the “stable” channel.
/cc @pbaity
For RedHat OpenShift OperatorHub users
I’ve submitted v1.10.1-rc1 packages for you you too, but I am still waiting for RedHat’s CI to finish testing the package.
It will hopefully pass the tests and be automatically merged and published in the next hour. And then within about 30 minutes of it merging, you should be able to find the v1.10.1-rc1 package in the OperatorHub in the “candidate” channel.
Thanks @wallrj for the hard work! I just tested with the release candidate and confirmed it’s fixed.
(I tested this with Kubernetes - EKS specifically - with OLM and installed from the candidate channel on OperatorHub)
For anyone else who’s using OLM on a Kubernetes cluster, not OCP (I’m using EKS), the workaround to edit the ClusterServiceVersion resource and changing the webhook command worked for me.
kubectl
equivalent of the relevant commands: