cert-manager: Webhook errors with cert-manager v1.10.0 on OpenShift Container Platform (OCP)

Describe the bug:

After installing 1.10 on OCP no certificates can be created as the following webhook error is produced

Error from server (InternalError): error when creating "cert.yml": Internal error occurred: failed calling webhook "mutate.webhooks.cert-manager.io": failed to call webhook: Post "[https://cert-manager-webhook-service.openshift-operators.svc:443/mutate?timeout=10s](https://cert-manager-webhook-service.openshift-operators.svc/mutate?timeout=10s)": x509: certificate is valid for cert-manager-webhook, cert-manager-webhook.openshift-operators, cert-manager-webhook.openshift-operators.svc, not cert-manager-webhook-service.openshift-operators.svc

Expected behaviour:

Create to work

Steps to reproduce the bug: Install Cert-Manager 1.10 Create cert

kind: Certificate
apiVersion: cert-manager.io/v1
metadata:
  name: example-certificate
  namespace: openshift-operators
spec:
  dnsNames:
    - example.com
  issuerRef:
    name: example-issuer
  secretName: example-certificate-tls

Anything else we need to know?:

The upgrade seems to have created a new service:

cert-manager                   ClusterIP   172.30.243.52    <none>        9402/TCP   237d
cert-manager-webhook           ClusterIP   172.30.14.57     <none>        443/TCP    237d
cert-manager-webhook-service   ClusterIP   172.30.108.134   <none>        443/TCP    131m

and a new service cert, issued by Openshift:

cert-manager-webhook-service-cert              kubernetes.io/tls                     3      3h46m

So the webhook seems to be using the new service, but the pod is still using the old certificate, issued by cert-manager-webhook-ca, hence the error.

Environment details::

  • Kubernetes version: 1.23
  • Cloud-provider/provisioner: Open Shift 4.10
  • cert-manager version: 4.10
  • Install method: OperatorHub

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (8 by maintainers)

Most upvoted comments

The patched cert-manager OperatorHub packages are now on operatorhub.io and on OpenShift OperatorHub community catalog, with package version v1.10.1. The underlying cert-manager remains v1.10.0.

I have tested on Kubernetes 1.25 with operatorhub.io and on OpenShift v4.11 and confirmed that the K8S API server can connect to the cert-manager webhook.

Apologies to everyone who was affected by this bug.

For Kubernetes Users

For people using packages from operatorhub.io on Kubernetes, I’ve published updated packages containing the fix for this problem in the release channel called “candidate”:

I’ve tested the installation on a Kind Kubernetes 1.25 cluster. If there are no objections I will publish v1.10.1 tomorrow in the “stable” channel.

/cc @pbaity

For RedHat OpenShift OperatorHub users

I’ve submitted v1.10.1-rc1 packages for you you too, but I am still waiting for RedHat’s CI to finish testing the package.

It will hopefully pass the tests and be automatically merged and published in the next hour. And then within about 30 minutes of it merging, you should be able to find the v1.10.1-rc1 package in the OperatorHub in the “candidate” channel.

Thanks @wallrj for the hard work! I just tested with the release candidate and confirmed it’s fixed.

(I tested this with Kubernetes - EKS specifically - with OLM and installed from the candidate channel on OperatorHub)

For anyone else who’s using OLM on a Kubernetes cluster, not OCP (I’m using EKS), the workaround to edit the ClusterServiceVersion resource and changing the webhook command worked for me. kubectl equivalent of the relevant commands:

kubectl get csv -n operators -o yaml cert-manager.v1.10.0 > cert-manager.v1.10.0.yaml
cp cert-manager.v1.10.0.yaml cert-manager.v1.10.0.yaml.backup
vi cert-manager.v1.10.0.yaml
kubectl apply -f cert-manager.v1.10.0.yaml