cert-manager: Failed to install " failed post-install: timed out waiting for the condition"

Describe the bug:

create namespace cert-manager
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.crds.yaml
helm install my-release --namespace cert-manager --version v1.6.1 jetstack/cert-manager

Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition

–>

Expected behaviour: For it to install successfully.

Steps to reproduce the bug:

create namespace cert-manager
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.crds.yaml
helm install my-release --namespace cert-manager --version v1.6.1 jetstack/cert-manager

Anything else we need to know?: I’m unsure how to diagnose further. After installation failed, I ran a few commands to see the outputs:

kubectl get all -n cert-manager
NAME                                                      READY   STATUS    RESTARTS   AGE
pod/my-release-cert-manager-789d899b78-j6cwp              1/1     Running   0          15m
pod/my-release-cert-manager-cainjector-68d95bb6c5-96q7l   1/1     Running   0          15m
pod/my-release-cert-manager-webhook-c9c57558d-wt22v       1/1     Running   0          15m

NAME                                      TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
service/my-release-cert-manager           ClusterIP   10.103.243.246   <none>        9402/TCP   15m
service/my-release-cert-manager-webhook   ClusterIP   10.103.131.32    <none>        443/TCP    15m

NAME                                                 READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/my-release-cert-manager              1/1     1            1           15m
deployment.apps/my-release-cert-manager-cainjector   1/1     1            1           15m
deployment.apps/my-release-cert-manager-webhook      1/1     1            1           15m

NAME                                                            DESIRED   CURRENT   READY   AGE
replicaset.apps/my-release-cert-manager-789d899b78              1         1         1       15m
replicaset.apps/my-release-cert-manager-cainjector-68d95bb6c5   1         1         1       15m
replicaset.apps/my-release-cert-manager-webhook-c9c57558d       1         1         1       15m

NAME                                                COMPLETIONS   DURATION   AGE
job.batch/my-release-cert-manager-startupapicheck   0/1           15m        15


kubectl describe job.batch/my-release-cert-manager-startupapicheck -n cert-manager
Name:             my-release-cert-manager-startupapicheck
Namespace:        cert-manager
Selector:         controller-uid=b585c7a9-3ea5-428e-98d4-2cf732b10701
Labels:           app=startupapicheck
                  app.kubernetes.io/component=startupapicheck
                  app.kubernetes.io/instance=my-release
                  app.kubernetes.io/managed-by=Helm
                  app.kubernetes.io/name=startupapicheck
                  app.kubernetes.io/version=v1.6.1
                  helm.sh/chart=cert-manager-v1.6.1
Annotations:      helm.sh/hook: post-install
                  helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
                  helm.sh/hook-weight: 1
Parallelism:      1
Completions:      1
Completion Mode:  NonIndexed
Start Time:       Sun, 05 Dec 2021 12:12:46 +0000
Pods Statuses:    0 Running / 0 Succeeded / 1 Failed
Pod Template:
  Labels:           app=startupapicheck
                    app.kubernetes.io/component=startupapicheck
                    app.kubernetes.io/instance=my-release
                    app.kubernetes.io/managed-by=Helm
                    app.kubernetes.io/name=startupapicheck
                    app.kubernetes.io/version=v1.6.1
                    controller-uid=b585c7a9-3ea5-428e-98d4-2cf732b10701
                    helm.sh/chart=cert-manager-v1.6.1
                    job-name=my-release-cert-manager-startupapicheck
  Service Account:  my-release-cert-manager-startupapicheck
  Containers:
   cert-manager:
    Image:      quay.io/jetstack/cert-manager-ctl:v1.6.1
    Port:       <none>
    Host Port:  <none>
    Args:
      check
      api
      --wait=1m
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>
Events:
  Type     Reason                Age    From            Message
  ----     ------                ----   ----            -------
  Normal   SuccessfulCreate      15m    job-controller  Created pod: my-release-cert-manager-startupapicheck--1-sqqxd
  Normal   SuccessfulDelete      9m29s  job-controller  Deleted pod: my-release-cert-manager-startupapicheck--1-sqqxd
  Warning  BackoffLimitExceeded  9m29s  job-controller  Job has reached the specified backoff limit

Environment details::

  • Kubernetes version: v1.22.1
  • Cloud-provider/provisioner: bare-metal
  • cert-manager version: v1.6.1
  • Install method: helm

/kind bug

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 9
  • Comments: 49 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Seeing this with v1.5.1 as well. I had to clean up resources manually and then the install worked with these flags:

helm uninstall cert-manager -n cert-manager
kubectl delete roles cert-manager-startupapicheck:create-cert -n cert-manager;
kubectl delete serviceaccount cert-manager-startupapicheck -n cert-manager;
kubectl delete serviceaccount default -n cert-manager;
kubectl delete jobs cert-manager-startupapicheck -n cert-manager;
kubectl delete rolebindings cert-manager-startupapicheck:create-cert -n cert-manager;

Finally: helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.5.1 --set startupapicheck.timeout=5m --set installCRDs=true

In case it helps someone else: I got this same error and was using a GKE autopilot Kubernetes cluster, and this had to be set when using helm to install: --set global.leaderElection.namespace=cert-manager got this info here: https://cert-manager.io/docs/installation/compatibility/

Since curling the webhook seems to work, maybe the timeout for the post-install job is too short.

By default, the timeout is 1 minute, which may be too short when downloading the image container itself takes a full minute:

https://github.com/jetstack/cert-manager/blob/7923823cff0afe5af49282bde3a5a1cc6f5abc84/deploy/charts/cert-manager/values.yaml#L424-L425

You can try a longer timeout:

helm install my-release --namespace cert-manager --version v1.6.1 jetstack/cert-manager --set startupapicheck.timeout=5m

it seems that installCRDs=true resolves this issue for me (using le v1.6.1)

In case someone sees this error message failed post-install: timed out waiting for the condition when using AKS + windows nodepool, you need to set nodeSelector to linux for the container images used by cert-manager.

helm install cert-manager jetstack/cert-manager \
  --namespace cert-manager \
  --version v1.8 \
  --set installCRDs=true \
  --set nodeSelector."kubernetes\.io/os"=linux \
  --set webhook.nodeSelector."kubernetes\.io/os"=linux \
  --set cainjector.nodeSelector."kubernetes\.io/os"=linux \
  --set startupapicheck.nodeSelector."kubernetes\.io/os"=linux

I tried to clean uninstall and then run this on a GKE autopilot Kubernetes cluster

helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --set startupapicheck.timeout=5m --set installCRDs=true --set global.leaderElection.namespace=cert-manager

and finally got success

NAME: cert-manager
LAST DEPLOYED: Fri Jun  2 04:40:45 2023
NAMESPACE: cert-manager
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
cert-manager v1.12.1 has been deployed successfully!

The solution with startupapicheck.timeout solved it for me aswell.

If anyone using terraform this was my code:

resource "helm_release" "cert_manager" {
  name  = "cert-manager"
  repository = "https://charts.jetstack.io"
  chart = "cert-manager"
  create_namespace = true
  version    = "v1.9.1"
  namespace  = "cert-manager"

  set {
    name  = "startupapicheck.timeout"
    value = "5m"
  }
  set {
    name  = "installCRDs"
    value = true 
  }
}

Do we have any update on this ? I’m facing the same issue, any thoughts ?

Disabling IPsec encryption in flannel fixed it for me!

Thank you @def1nity: The version via kubectl seems to work.

I am facing the same problem. Running curl like so from another pod works (at least no timeouts): curl -v -k -X POST -d ‘{“test”: “test”}’ https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s

Seeing this with v1.5.1 as well. I had to clean up resources manually and then the install worked with these flags:

helm uninstall cert-manager -n cert-manager
kubectl delete roles cert-manager-startupapicheck:create-cert -n cert-manager;
kubectl delete serviceaccount cert-manager-startupapicheck -n cert-manager;
kubectl delete serviceaccount default -n cert-manager;
kubectl delete jobs cert-manager-startupapicheck -n cert-manager;
kubectl delete rolebindings cert-manager-startupapicheck:create-cert -n cert-manager;

Finally: helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.5.1 --set startupapicheck.timeout=5m --set installCRDs=true

Solved the issue!

In my case, the startupapicheck Job was stuck because of Istio’s sidecar auto-injection.

Simplest way to fix the problem is to add sidecar.istio.io/inject=false label to the target pod.

We can leverage startupapicheck.podLabels Helm Chart Value to configure it.

This issue is usually caused by the Kubernetes API server being unable to reach the webhook pod. There are a vast number of different reason for this, but they are all platform, networking or CNI specific. We have documented some of the solutions on the website: https://cert-manager.io/docs/concepts/webhook/#known-problems-and-solutions .

I’m going to close this issue because it’s becoming a catch-all for this error message, but if you are sure you have a unique issue feel free to open another.

Also @inteon found out my issue is with flanneld running on vxlan mode. For my environment switching to host-gw solved my issues. I’ve also create this issue on flannel repo: https://github.com/flannel-io/flannel/issues/1511

@inteon I’m running Debian 11 with flannel 1.15.1 This is a bare metal system. No network policies whatsoever.

So I did two things: first added the label cert-manager.io/disable-validation: “true” to cert-manager namespace and deleted the webhook with: kubectl delete validatingwebhookconfigurations/cert-manager-webhook with those my certificate is issued with no side effects (for now).