cert-manager: Documenting "context deadline exceeded" errors relating to the webhook
đ˘ Update from the cert-manager maintainers: For those of you encountering problems with the cert-manager webhook, please read @maelvls 's Definitive Debugging Guide for the cert-manager Webhook Pod.
Describe the bug:
When I try to create a ClusterIssuer I get the following error
kubectl apply -f cert-issuer-letsencrypt-dev.yml
Error from server (InternalError): error when creating "cert-issuer-letsencrypt-dev.yml":
Internal error occurred: failed calling webhook "webhook.certmanager.k8s.io":
Post https://kubernetes.default.svc:443/apis/webhook.certmanager.k8s.io/v1beta1/mutations?timeout=30s:
context deadline exceeded
Expected behaviour:
Creation of ClusterIssuer works without errors
Steps to reproduce the bug:
Install cert-manager as follows
kubectl apply -f https://raw.githubusercontent.com/jetstack/cert-manager/release-0.10/deploy/manifests/00-crds.yaml
kubectl create namespace cert-manager
kubectl label namespace cert-manager certmanager.k8s.io/disable-validation=true
helm repo add jetstack https://charts.jetstack.io
helm repo update
helm install \
--name cert-manager \
--namespace cert-manager \
--version v0.10.1 \
jetstack/cert-manager
Then I run
apiVersion: certmanager.k8s.io/v1alpha1
kind: ClusterIssuer
metadata:
name: letsencrypt-dev
namespace: cert-manager
spec:
acme:
# The ACME server URL
server: https://acme-staging-v02.api.letsencrypt.org/directory
# Email address used for ACME registration
email: xxx@xxxx.com
# Name of a secret used to store the ACME account private key
privateKeySecretRef:
name: letsencrypt-dev
# Enable the HTTP-01 challenge provider
# http01: {}
solvers:
- dns01:
cloudflare:
email: xxxx@xxxx.com
apiKeySecretRef:
name: cloudflare-api-key-secret
key: api-key
Anything else we need to know?:
Environment details::
- Kubernetes version (e.g. v1.10.2):
1.15 - Cloud-provider/provisioner (e.g. GKE, kops AWS, etc):
baremetal - cert-manager version (e.g. v0.4.0):
0.10.1 - Install method (e.g. helm or static manifests):
helm
/kind bug
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 25
- Comments: 83 (4 by maintainers)
Nope still stuck an this sucks
I can confirm that I have exactly the same issue. My environment:
EKS v1.16.8 CNI: Calico Cert-Manager: v0.15.1 installed using HELM
Iâm getting errors like:
Error from server (InternalError): error when creating "ClusterIssuerDns.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: context deadline exceededError from server (InternalError): error when creating "ClusterIssuerDns.yaml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: Address is not allowedIâm not 100%, but I suspect the issue with a connection from the API to the webhook (Calico creates new subnet, not sure if API is able to access it)âŚ
Why is this issue closed ? I am facing same issue, with the helm chart of cert-manager v1.1.0
main.go:38] cert-manager "msg"="error executing command" "error"="listen tcp :10250: bind: address already in use"@mostafa8026 seems to have corrected the issue by changing the port 10250. Why is this even a port issue ?
any updates on this issue?
Error from server (InternalError): error when creating âtest-resources.yamlâ: Internal error occurred: failed calling webhook âwebhook.cert-manager.ioâ: Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: context deadline exceeded
And my solution was using these changes in yaml file:
Adding the
hostNetwork: trueto the spec of webhook, and changing the securePort and its relative ports to something other than 10250 (like 10666 that I choose đ, also donât forget to change the related service), here are the changes:Hello,
Just wanted to let everyone know that I have it working now. Some information on our cluster:
What worked for me is the following guide here: https://docs.cert-manager.io/en/release-0.11/getting-started/install/kubernetes.html
It is absolutely important that nothing is lingering around from your old deploy. Run
kubectl get crdand delete all (new and old) cert-manager CRDâs.Run
kubectl get apiserviceand make sure there is nothing related to certificatesRunning
kubectl get certorkubectl get clusterissuershould say something along the lines of âThis resource type does not existâ (I donât have the exact error, but you get the point).Great. Now install the 0.11.1 CRDâs:
Now install cert-manager 0.11.1. Make sure you install 0.11.1, not 0.11.0⌠That version doesnât seem to work either.
Great. Now make sure your
ClusterIssuerandCertificateâs are using theapiVersion: cert-manager.io/v1alpha2.My Suspicions:
When installing 1.15.11, looking at the kube-apiserver logs, it appears that itâs trying to communicate to the webhook service by using its DNS (
https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s). As I said above, this short-version of the DNS name does not work for some reason. Maybe itâs akubeadmthing.When using 0.11.1, it tries to communicate via IP address instead, and I suppose this is what is making it work.
Something important that I found during my research, the
kube-apiservercanât actually resolve cluster DNS. The/etc/resolv.confis inherited from the master node. This is designed intentionally because apparentlykube-apiserveris the source of truth for DNS.Something I donât understand is from a node, why canât you
pingaServiceClusterIP? You can do it for any pod on any node, notServicesthough. So I donât get how thekube-apiserveris making calls to the webhook.Sorry for rambling. Please let me know if youâre still struggling. I can try to help.
I am having the same issues as @andrewkaczynski
The haiku about DNS is true :
So, for those who are in this boat who are confused as heck by it, check that you can run dns queries from inside your pods
Context deadline exceededin my case indicated the pod couldnât look up the acme api endpoint.Also in my case, this was any outbound traffic to the internet at all from pods being blocked.
Furthermore, for those who land here where cert-manager is the first thing they set up on a cluster and are bitten by this : k3s on debian 10 requires you use the legacy iptables command (this may apply to other k8s distros, but definitely does to k3s) - https://github.com/coredns/coredns/issues/2693
I still submit that
Context deadline exceededis a poor error message and something more helpful here would be good.I resolved my issue. May not apply to everyone, but still.
During cert creation, the API server accesses the webhook. But in my case, the API server cannot access pods in overlay network. So I have webhook running in hostNetwork mode. Now the error is gone.
My solution was: Download the Cert-Manager manifest (i.e. https://github.com/jetstack/cert-manager/releases/download/v1.1.0/cert-manager.yaml) and inserting the following block after each âcontainers:â-declaration in the manifest and appling it:
How did you install? I am trying helm install, with weave, on eks and I am getting the same errors. failed calling webhook âwebhook.cert-manager.ioâ - The chart has hostNetwork set to false and it seems most of the instructions on how to get it to work are using older versions. I tried forking the charts and making the change, but then there was some type of image dependency. What was your method?
For me it was an issue with debian 10 and iptables, see here: https://discuss.kubernetes.io/t/kubernetes-compatible-with-debian-10-buster/7853
I have the same issue with:
v1.17.0baremetal (openstack queen)0.13.1helm 2.16.3I think it is a timing problem because it was working when I was âmanuallyâ installing it, but not anymore in my terraform configuration script. Quite rough but I resolved it by adding a system delay just before applying the ingress and after the install:
Unfortunately restarting the pod didnât fix the issue for me.
Edit: Well never mind, I restarted all the
cert-managerrelated pods and now it worked. StrangeI am also seeing the same issue. I have a charmed kubernetes cluster running with flannel and calico. I changed the deployment config to use hostnetwork and changed the port and nothing. Events: Type Reason Age From Message
Warning ErrVerifyACMEAccount 33s (x4 over 73s) cert-manager Failed to verify ACME account: context deadline exceeded Warning ErrInitIssuer 33s (x4 over 73s) cert-manager Error initializing issuer: context deadline exceeded
My DNS working because I can access letsencrypt from any port and the challenge see to have worked.
Iâm seeing this on fresh aks clusters, right after installing of cert-manager, when I create an Issuer. Strange is that 2/3 clusters have this issue, but one of them doesnât, although they are provisioned in the same way.
The Issuer Iâm adding that triggers the error is
We are running a fresh install of 1.17.7 via
kubeadm, using the flannel VXLAN CNI. Weâre also seeing the following error:Doing some googling, I believe this is due to DNS. If I
execonto one of my NGINX pods (using this pod arbitrarily, nothing special about it) and try to resolve the above address, I get this:However, when you append
.cluster.local(the full domain name), itâll resolve just fine:And as you can see, this is the IP address of my
webhookpod:So this is where I get confused⌠I read that
.svcis the equivalent (or rather, short-version) of.svc.cluster.local. Why is it not working? Is this configurable? Reading a different issue, some person had to re-create their cluster and supply some DNS options intoKubeSpray. However Iâm not using KubeSpray.Appreciate any help, thanks.
Edit: Here is the other issue I was referring to: https://github.com/jetstack/cert-manager/issues/2640
Folks I tracked it down to using âservice-dns-domain=âk8.example.comâ in my kubeadm init.
Can you please explain what you did with coredns to correct the problem?
And the pod seems like working:
I somehow managed to workaround this issue when downgrading to v0.11. everything seems to working properly. https://docs.cert-manager.io/en/release-0.11/
@javachen so I guess my guess was wrong then đŚ
@papanito centos7 helm3 k8s1.17.2