cert-manager: Failed to install " failed post-install: timed out waiting for the condition"
Describe the bug:
create namespace cert-manager
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.crds.yaml
helm install my-release --namespace cert-manager --version v1.6.1 jetstack/cert-manager
Error: INSTALLATION FAILED: failed post-install: timed out waiting for the condition
–>
Expected behaviour: For it to install successfully.
Steps to reproduce the bug:
create namespace cert-manager
kubectl apply -f https://github.com/jetstack/cert-manager/releases/download/v1.6.1/cert-manager.crds.yaml
helm install my-release --namespace cert-manager --version v1.6.1 jetstack/cert-manager
Anything else we need to know?: I’m unsure how to diagnose further. After installation failed, I ran a few commands to see the outputs:
kubectl get all -n cert-manager
NAME READY STATUS RESTARTS AGE
pod/my-release-cert-manager-789d899b78-j6cwp 1/1 Running 0 15m
pod/my-release-cert-manager-cainjector-68d95bb6c5-96q7l 1/1 Running 0 15m
pod/my-release-cert-manager-webhook-c9c57558d-wt22v 1/1 Running 0 15m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/my-release-cert-manager ClusterIP 10.103.243.246 <none> 9402/TCP 15m
service/my-release-cert-manager-webhook ClusterIP 10.103.131.32 <none> 443/TCP 15m
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/my-release-cert-manager 1/1 1 1 15m
deployment.apps/my-release-cert-manager-cainjector 1/1 1 1 15m
deployment.apps/my-release-cert-manager-webhook 1/1 1 1 15m
NAME DESIRED CURRENT READY AGE
replicaset.apps/my-release-cert-manager-789d899b78 1 1 1 15m
replicaset.apps/my-release-cert-manager-cainjector-68d95bb6c5 1 1 1 15m
replicaset.apps/my-release-cert-manager-webhook-c9c57558d 1 1 1 15m
NAME COMPLETIONS DURATION AGE
job.batch/my-release-cert-manager-startupapicheck 0/1 15m 15
kubectl describe job.batch/my-release-cert-manager-startupapicheck -n cert-manager
Name: my-release-cert-manager-startupapicheck
Namespace: cert-manager
Selector: controller-uid=b585c7a9-3ea5-428e-98d4-2cf732b10701
Labels: app=startupapicheck
app.kubernetes.io/component=startupapicheck
app.kubernetes.io/instance=my-release
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=startupapicheck
app.kubernetes.io/version=v1.6.1
helm.sh/chart=cert-manager-v1.6.1
Annotations: helm.sh/hook: post-install
helm.sh/hook-delete-policy: before-hook-creation,hook-succeeded
helm.sh/hook-weight: 1
Parallelism: 1
Completions: 1
Completion Mode: NonIndexed
Start Time: Sun, 05 Dec 2021 12:12:46 +0000
Pods Statuses: 0 Running / 0 Succeeded / 1 Failed
Pod Template:
Labels: app=startupapicheck
app.kubernetes.io/component=startupapicheck
app.kubernetes.io/instance=my-release
app.kubernetes.io/managed-by=Helm
app.kubernetes.io/name=startupapicheck
app.kubernetes.io/version=v1.6.1
controller-uid=b585c7a9-3ea5-428e-98d4-2cf732b10701
helm.sh/chart=cert-manager-v1.6.1
job-name=my-release-cert-manager-startupapicheck
Service Account: my-release-cert-manager-startupapicheck
Containers:
cert-manager:
Image: quay.io/jetstack/cert-manager-ctl:v1.6.1
Port: <none>
Host Port: <none>
Args:
check
api
--wait=1m
Environment: <none>
Mounts: <none>
Volumes: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 15m job-controller Created pod: my-release-cert-manager-startupapicheck--1-sqqxd
Normal SuccessfulDelete 9m29s job-controller Deleted pod: my-release-cert-manager-startupapicheck--1-sqqxd
Warning BackoffLimitExceeded 9m29s job-controller Job has reached the specified backoff limit
Environment details::
- Kubernetes version: v1.22.1
- Cloud-provider/provisioner: bare-metal
- cert-manager version: v1.6.1
- Install method: helm
/kind bug
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 9
- Comments: 49 (8 by maintainers)
Seeing this with v1.5.1 as well. I had to clean up resources manually and then the install worked with these flags:
Finally:
helm install cert-manager jetstack/cert-manager --namespace cert-manager --create-namespace --version v1.5.1 --set startupapicheck.timeout=5m --set installCRDs=true
In case it helps someone else: I got this same error and was using a GKE autopilot Kubernetes cluster, and this had to be set when using helm to install:
--set global.leaderElection.namespace=cert-manager
got this info here: https://cert-manager.io/docs/installation/compatibility/Since curling the webhook seems to work, maybe the timeout for the post-install job is too short.
By default, the timeout is 1 minute, which may be too short when downloading the image container itself takes a full minute:
https://github.com/jetstack/cert-manager/blob/7923823cff0afe5af49282bde3a5a1cc6f5abc84/deploy/charts/cert-manager/values.yaml#L424-L425
You can try a longer timeout:
it seems that
installCRDs=true
resolves this issue for me (using le v1.6.1)In case someone sees this error message failed post-install: timed out waiting for the condition when using AKS + windows nodepool, you need to set nodeSelector to linux for the container images used by cert-manager.
I tried to clean uninstall and then run this on a GKE autopilot Kubernetes cluster
and finally got success
The solution with startupapicheck.timeout solved it for me aswell.
If anyone using terraform this was my code:
Do we have any update on this ? I’m facing the same issue, any thoughts ?
Disabling IPsec encryption in flannel fixed it for me!
Thank you @def1nity: The version via
kubectl
seems to work.I am facing the same problem. Running curl like so from another pod works (at least no timeouts): curl -v -k -X POST -d ‘{“test”: “test”}’ https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=10s
Solved the issue!
In my case, the
startupapicheck
Job was stuck because of Istio’s sidecar auto-injection.Simplest way to fix the problem is to add
sidecar.istio.io/inject=false
label to the target pod.We can leverage
startupapicheck.podLabels
Helm Chart Value to configure it.This issue is usually caused by the Kubernetes API server being unable to reach the webhook pod. There are a vast number of different reason for this, but they are all platform, networking or CNI specific. We have documented some of the solutions on the website: https://cert-manager.io/docs/concepts/webhook/#known-problems-and-solutions .
I’m going to close this issue because it’s becoming a catch-all for this error message, but if you are sure you have a unique issue feel free to open another.
Also @inteon found out my issue is with flanneld running on vxlan mode. For my environment switching to host-gw solved my issues. I’ve also create this issue on flannel repo: https://github.com/flannel-io/flannel/issues/1511
@inteon I’m running Debian 11 with flannel 1.15.1 This is a bare metal system. No network policies whatsoever.
So I did two things: first added the label cert-manager.io/disable-validation: “true” to cert-manager namespace and deleted the webhook with: kubectl delete validatingwebhookconfigurations/cert-manager-webhook with those my certificate is issued with no side effects (for now).