cert-manager: Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: net/http: TLS handshake timeout
Bugs should be filed for issues encountered whilst operating cert-manager. You should first attempt to resolve your issues through the community support channels, e.g. Slack, in order to rule out individual configuration errors. Please provide as much detail as possible.
Describe the bug:
Cluster Issuer installation fails with TLS handshake timeout
kubectl apply -f cert-issuer-letsencrypt-prd.yml -n cert-manager
Error from server (InternalError): error when creating "cert-issuer-letsencrypt-prd.yml": Internal error occurred: failed calling webhook "webhook.cert-manager.io": Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: net/http: TLS handshake timeout
Expected behaviour:
kubectl apply -f cert-issuer-letsencrypt-prd.yml -n cert-manager
works successfully and does not generate an error
Steps to reproduce the bug:
-
Create
cert-manager
nskubectl create ns cert-manager
-
Install
cert-manager
usinghelm 3
helm install cert-manager jetstack/cert-manager --namespace cert-manager NAME: cert-manager LAST DEPLOYED: Sat Feb 15 11:40:28 2020 NAMESPACE: cert-manager STATUS: deployed REVISION: 1 TEST SUITE: None NOTES: cert-manager has been deployed successfully!
-
Add secret
letsencrypt-prd
kubectl -n cert-manager apply -f cert-cloudflare-api-key-secret.yml
-
Create
cluster-issuer
kubectl apply -f cert-issuer-letsencrypt-prd.yml -n cert-manager
cert-issuer-letsencrypt-prd.yml
apiVersion: cert-manager.io/v1alpha2
kind: ClusterIssuer
metadata:
name: letsencrypt-prd
namespace: cert-manager
spec:
acme:
server: https://acme-v02.api.letsencrypt.org/directory
email: xxxx
privateKeySecretRef:
name: letsencrypt-prd
solvers:
- dns01:
cloudflare:
email: xxxx
apiKeySecretRef:
name: cloudflare-api-key-secret
key: api-key
Anything else we need to know?:
Environment details::
- Kubernetes version (e.g. v1.10.2):
v.1.17.2
- Cloud-provider/provisioner (e.g. GKE, kops AWS, etc):
bare-metal
- cert-manager version (e.g. v0.4.0):
0.13.0
- Install method (e.g. helm or static manifests):
helm 3
/kind bug
Pods are running fine, no restarts
kubectl -n cert-manager get pods
NAME READY STATUS RESTARTS AGE
cert-manager-c6cb4cbdf-djqt4 1/1 Running 0 37m
cert-manager-cainjector-76f7596c4-wsb4z 1/1 Running 0 37m
cert-manager-webhook-8575f88c85-xf7w2 1/1 Running 0 31m
crd
are there
kubectl get crd | grep cert-manager
certificaterequests.cert-manager.io 2020-02-15T10:39:37Z
certificates.cert-manager.io 2020-02-15T10:39:38Z
challenges.acme.cert-manager.io 2020-02-15T10:39:38Z
clusterissuers.cert-manager.io 2020-02-15T10:39:39Z
issuers.cert-manager.io 2020-02-15T10:39:40Z
orders.acme.cert-manager.io 2020-02-15T10:39:40Z
logs of cert-manager-webhook
pod repreatetly show http: TLS handshake error from 10.42.152.128:5067: EOF
kubectl -n cert-manager logs cert-manager-webhook-8575f88c85-xf7w2
I0215 10:47:05.409158 1 main.go:64] "msg"="enabling TLS as certificate file flags specified"
I0215 10:47:05.409423 1 server.go:126] "msg"="listening for insecure healthz connections" "address"=":6080"
I0215 10:47:05.409471 1 server.go:138] "msg"="listening for secure connections" "address"=":10250"
I0215 10:47:05.409495 1 server.go:155] "msg"="registered pprof handlers"
I0215 10:47:05.409672 1 tls_file_source.go:144] "msg"="detected private key or certificate data on disk has changed. reloading certificate"
2020/02/15 10:48:46 http: TLS handshake error from 10.42.152.128:25427: EOF
2020/02/15 10:53:56 http: TLS handshake error from 10.42.152.128:48126: EOF
2020/02/15 10:59:06 http: TLS handshake error from 10.42.152.128:21683: EOF
2020/02/15 11:04:16 http: TLS handshake error from 10.42.152.128:9457: EOF
2020/02/15 11:09:26 http: TLS handshake error from 10.42.152.128:41640: EOF
2020/02/15 11:14:36 http: TLS handshake error from 10.42.152.128:56638: EOF
here the logs from cert-manager
-pod
kubectl -n cert-manager logs cert-manager-c6cb4cbdf-fzdmj
I0215 12:23:31.410690 1 start.go:76] cert-manager "msg"="starting controller" "git-commit"="6d9200f9d" "version"="v0.13.0"
W0215 12:23:31.410750 1 client_config.go:543] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0215 12:23:31.411825 1 controller.go:167] cert-manager/controller/build-context "msg"="configured acme dns01 nameservers" "nameservers"=["10.43.0.10:53"]
I0215 12:23:31.412082 1 controller.go:130] cert-manager/controller "msg"="starting leader election"
I0215 12:23:31.412188 1 metrics.go:202] cert-manager/metrics "msg"="listening for connections on" "address"="0.0.0.0:9402"
I0215 12:23:31.412836 1 leaderelection.go:242] attempting to acquire leader lease kube-system/cert-manager-controller...
I0215 12:24:50.921660 1 leaderelection.go:252] successfully acquired lease kube-system/cert-manager-controller
I0215 12:24:50.922007 1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered" "type"="selfsigned"
I0215 12:24:50.922139 1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered" "type"="venafi"
I0215 12:24:50.922149 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-selfsigned"
I0215 12:24:50.922214 1 controller.go:74] cert-manager/controller/certificaterequests-issuer-selfsigned "msg"="starting control loop"
I0215 12:24:50.922245 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-venafi"
I0215 12:24:50.922291 1 controller.go:74] cert-manager/controller/certificaterequests-issuer-venafi "msg"="starting control loop"
I0215 12:24:50.922307 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="clusterissuers"
I0215 12:24:50.922338 1 controller.go:74] cert-manager/controller/clusterissuers "msg"="starting control loop"
I0215 12:24:50.922376 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="webhook-bootstrap"
I0215 12:24:50.922407 1 controller.go:74] cert-manager/controller/webhook-bootstrap "msg"="starting control loop"
I0215 12:24:50.922412 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="issuers"
I0215 12:24:50.922474 1 controller.go:74] cert-manager/controller/issuers "msg"="starting control loop"
I0215 12:24:50.922537 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="orders"
I0215 12:24:50.922578 1 controller.go:74] cert-manager/controller/orders "msg"="starting control loop"
I0215 12:24:50.922602 1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered" "type"="acme"
I0215 12:24:50.922711 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-acme"
I0215 12:24:50.922736 1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered" "type"="vault"
I0215 12:24:50.922740 1 controller.go:74] cert-manager/controller/certificaterequests-issuer-acme "msg"="starting control loop"
I0215 12:24:50.922855 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-vault"
I0215 12:24:50.922904 1 controller.go:74] cert-manager/controller/certificaterequests-issuer-vault "msg"="starting control loop"
I0215 12:24:50.922982 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificates"
I0215 12:24:50.923031 1 controller.go:74] cert-manager/controller/certificates "msg"="starting control loop"
I0215 12:24:50.923042 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="ingress-shim"
I0215 12:24:50.923073 1 controller.go:74] cert-manager/controller/ingress-shim "msg"="starting control loop"
I0215 12:24:51.025320 1 controller.go:172] cert-manager/controller/certificaterequests "msg"="new certificate request controller registered" "type"="ca"
I0215 12:24:51.025331 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="challenges"
I0215 12:24:51.025385 1 controller.go:74] cert-manager/controller/challenges "msg"="starting control loop"
I0215 12:24:51.025430 1 controller.go:101] cert-manager/controller "msg"="starting controller" "controller"="certificaterequests-issuer-ca"
I0215 12:24:51.025473 1 controller.go:74] cert-manager/controller/certificaterequests-issuer-ca "msg"="starting control loop"
I0215 12:24:51.122618 1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-webhook-ca"
I0215 12:24:51.122638 1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cloudflare-api-key-secret"
I0215 12:24:51.122650 1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-webhook-tls"
I0215 12:24:51.122669 1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/sh.helm.release.v1.cert-manager.v1"
I0215 12:24:51.122674 1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cloudflare-api-key-secret"
I0215 12:24:51.122705 1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-token-5tdm7"
I0215 12:24:51.122729 1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-token-5tdm7"
I0215 12:24:51.122780 1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-webhook-token-6hpwz"
I0215 12:24:51.122729 1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/sh.helm.release.v1.cert-manager.v1"
I0215 12:24:51.122805 1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-webhook-token-6hpwz"
I0215 12:24:51.122840 1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/default-token-vlftn"
I0215 12:24:51.122867 1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/default-token-vlftn"
I0215 12:24:51.122618 1 controller.go:129] cert-manager/controller/webhook-bootstrap "msg"="syncing item" "key"="cert-manager/cert-manager-cainjector-token-lzwbt"
I0215 12:24:51.122903 1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-cainjector-token-lzwbt"
I0215 12:24:51.123241 1 controller.go:129] cert-manager/controller/ingress-shim "msg"="syncing item" "key"="kube-system/dashboard-kubernetes-dashboard"
I0215 12:24:51.123256 1 controller.go:197] cert-manager/controller/webhook-bootstrap/webhook-bootstrap/ca-secret "msg"="ca certificate already up to date" "resource_kind"="Secret" "resource_name"="cert-manager-webhook-ca" "resource_namespace"="cert-manager"
I0215 12:24:51.123281 1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-webhook-ca"
I0215 12:24:51.123420 1 controller.go:255] cert-manager/controller/webhook-bootstrap/webhook-bootstrap/ca-secret "msg"="serving certificate already up to date" "resource_kind"="Secret" "resource_name"="cert-manager-webhook-tls" "resource_namespace"="cert-manager"
I0215 12:24:51.123450 1 controller.go:135] cert-manager/controller/webhook-bootstrap "msg"="finished processing work item" "key"="cert-manager/cert-manager-webhook-tls"
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 70 (3 by maintainers)
I had issue with deploy ClusterIssuer, error was:
Solved as:
@Antiarchitect Only your solution worked for me!
Steps taken:
@turkenh I am seeing the same issue but no errors in my events. I am following the same approach as you. i.e., deploy the cert-manager first and then the issuer with a separate helm chart. Just as you had observed, I do not see the error if I deploy my Issuer after a few seconds (~60). Back to back installations of cert-manager and the Issuer certainly throws the following error: Internal error occurred: failed calling webhook “webhook.cert-manager.io”: Post https://cert-manager-webhook.cert-manager.svc:443/mutate?timeout=30s: x509: certificate signed by unknown authority @munnerz , I am certainly seeing the same error as the original issue with v.015.0. Any thoughts on why I am seeing the error when I try to immediately deploy the Issuer after the cert-manager deployment and NOT when I deploy the Issuer after a bit of a wait? This still appears to be a bug. Do you want me to open another issue to track this?
Using cert manager
v0.15.0
which is released yesterday. WithinstallCRDs
settrue
, I am still getting the same error as above:Our scripts deploy another helm chart which contains cert manager resources just after cert manager helm release reports ready and helm fails with above error. However, if I try to create resources after some time, I don’t get any errors. So, it looks like a timing issue but I was not getting it with
v0.15.0-alpha.0
.Waiting a while as mentioned earlier seems to do the trick, so there is probably a timing issue somewhere. Tested on 0.15.2.
The following works for me (you might wanna tinker with the timer):
We’ve made significant improvements to the way TLS is managed in the upcoming v0.15 release, as well as adding an
installCRDs
option to the Helm chart which will handle correctly updating service names and conversion webhook configuration when deploying into namespaces other thancert-manager
or using a Helm release name other thancert-manager
.I think this issue can now be closed after this, and if anyone is still running into issues I’d advise you to try the new
v0.15.0-alpha.1
release and reporting back! (to be safe, it may be best to start ‘fresh’ in case you have a currently broken configuration!)Still not sure why it does not work with the
webhook
. Also not sure whether I really sure if this is the best approach asAlso interestingly, webhook was working on the initial setup of my cluster back in January. I did add an additional node and updated the underlying OS. Not sure yet why it stopped working…
@munnerz Thank you for your help. I deployed the cert manager using the kubectl command like below :
kubectl apply --validate=false -f https://github.com/jetstack/cert-manager/releases/download/v0.15.0-alpha.1/Cert-manager.yaml
every thing working fine as you can see:
kubectl get pod,service,endpoints -n cert-manager NAME READY STATUS RESTARTS AGE pod/cert-manager-5bb5b9dcf8-sb52s 1/1 Running 0 28m pod/cert-manager-cainjector-869f7868b7-rrrw2 1/1 Running 0 28m pod/cert-manager-webhook-79d78c45cd-7fxfs 1/1 Running 0 28m
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/cert-manager ClusterIP 10.99.239.39 <none> 9402/TCP 28m service/cert-manager-webhook ClusterIP 10.99.49.145 <none> 443/TCP 28m
NAME ENDPOINTS AGE endpoints/cert-manager 10.244.4.60:9402 28m endpoints/cert-manager-webhook 10.244.5.75:10250 28m
but when i try to create the issuer and certificate i got the timeout and context deadline exceeded
I’ve solved my problems with sed 😃)
But you should remove not only CRDs but
if they were improperly configured before
We ran into this, and the specific resource that was conflicting was the
cert-manager-webhook-ca
secret, which had been left over from a previous installation that was removed manually. When I looked at the details, that secret had been created 2 years previously to the new version of cert-manager being installed. I was able to simply runkubectl delete -f https://github.com/jetstack/cert-manager/releases/download/v[X.X]/cert-manager.yaml
which removed everything in that namespace (including old stuff), and then re-rankubectl apply ...
. After doing that, I confirmed that the secret was new, and everything started working. HTHfixed this problem on my hard upgrade from v0.10 to v0.15 by deleting “cert-manager-webhook-ca” cause it’s not updated automatically if exists
I am having the same symptom. And I am sure it is something with my Weave CNI, because it worked with AWS VPC CNI.
I even tried tcpdump on cert-manager and cert-manager-webhook pods, surprisingly, there is no traffic on webhook port.
hi @papanito my configuration is the same, but Kubernetes version 1.16 and I tried to install cert-manager today using static file instead of helm.
I had exactly the same issue and solved it by following this page https://cert-manager.io/docs/installation/compatibility/ . Particularly, I have used
cert-manager-no-webhook.yaml
instead ofcert-manager.yaml
. You can consider if this option is suitable for you.So now I finished my configuration and HTTPS works fine. Followed https://www.digitalocean.com/community/tutorials/how-to-set-up-an-nginx-ingress-with-cert-manager-on-digitalocean-kubernetes . A note that I’m using bare metal.
这个问题可能是cni导致的,我修改了calico的mtu后这个问题解决了(This problem may be caused by cni. After I modified the mtu of calico, the problem was solved.)
“mtu”: 1440-> “mtu”: 1420,
Using hetzner cloud servers here, and the problem was fixed indeed by changing MTU not cert-manager
Changing calico MTU from 1440 to 1400 or 1420 fixed the error running test-resource.yaml
MTU Change:
I had similar issue and found out that my kube-controller-manager pod and kube-api-server had wrongly configured NO_PROXY not excluding .svc from proxy traffic. I had to change /etc/kubernetes/manifests/*.yaml on master node.
Hi, I ran into the same issue like @zzaareer on a rancher kubernetes cluster. I have successfully deployed cert-manager via helm v3:
but when I try to install the test resources, I get the following error:
I attached a sidecar to the cert-manager pod for debugging and it shows me that I can resolve
cert-manager-webhook.cert-manager.svc
, but the IP is not answering on a ping.I’ve resolved the IP to
10.43.179.12
and this matches mysvc/cert-manager-webhook
service. When I dok port-forward service/cert-manager-webhook 9090:443
and calllocalhost:9090
in my browser, I see that the API is up. But why is my cert-manager not reaching the webhook pod?After changing version to newer (v1.8.0 in curl) also helped for me! Thanks
Potential resolution:
In our case, our cert-manager-webhooks pod had been running for nearly a year. We suspect it was using some sort of out-of-date internal cluster cert. After deleting the webhook pod, the Deployment spun up a new one without the issue.
Can you file a seperate issue for that?
I’m not sure if this is helpful, but an FYI: attempting to apply this via
kubectl -k
(kustomize) failed, butkubectl -f
succeeded. I don’t know how to research more.EDIT: potentially very relevant, I was working w/ a possibly very bad mix of
kubectl
versions:I followed the same action plan and it is working. but after that i cant describe or delete the issuer , it is giving me the following error:
conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.not-cert-manager.svc:443/convert?timeout=30s: service “cert-manager-webhook” not found.
Any Idea ?
Hi @TylerIlunga and @Antiarchitect,
I’ve the same issue, and with that fix I’ve already created an issuer. But when try describe created issuer, return that error:
conversion webhook for cert-manager.io/v1alpha2, Kind=Issuer failed: Post https://cert-manager-webhook.not-cert-manager.svc:443/convert?timeout=30s: service "cert-manager-webhook" not found.
Here https://github.com/jetstack/cert-manager/issues/2752#issuecomment-605966908 you can find the answer of @munnerz that explains very well the issue, the reason behind and a possible workaround.
Got this error too.
Reason: the
node MTU
is smaller than the cert-manager-webhookpod MTU
, lead to the TLS response packet not able to reach the node. Solution: adjust the cert-manager-webhookpod MTU
to (node MTU - 20
).Since I also had the error
which bothered me for quite some time I want to share my story 😉 Maybe it helps someone. So to get this error away I also did
as mentioned in https://github.com/jetstack/cert-manager/issues/2602#issuecomment-669091541 . I’m installing cert-manager via the official Helm chart. So while I tried to upgrade from cert-manager v1.4.x to v1.5.0 the
startupapicheck
failed which also tries to call the webhook and it also exited withcontext deadline exceeded
. While you can disable this check I really wanted to find out what was the real cause.So while chatting with a team mate about that issue he asked the right question: Are you aware that the K8s control plane tries to connect to that webhook? 😉 I have no idea why I ignored that fact for quite a while… In my case there was just no connection from the control plane to the “Pod network” and I actually never needed it. So the controller nodes tried to connect to
https://cert-manager-webhook.cert-manager.svc:443/validate?timeout=30s:
and they of course had no idea where to route that request because there was just no network route from the controller nodes network to the “Pod network” (the K8s Pod and Service IP range).My solution for now actually looks like this:
This makes the webhook listening on port
30001
on the host network too. So now the controller nodes can communicate with the webhook via the host network. Of courseworker
needs to be replaced with a real hostname e.g. And to avoid a certificate error--dynamic-serving-dns-names
is also needed. Here a list of valid DNS names can be included so that the webhook TLS certificate matches the hostname in the URL.I had a 60 second wait built in to my script and it still failed, came back 10 minutes later and tried this and it worked.
@munnerz, can you please consider re-opening this. Keeps happening in 0.15.2.
I ended up finding another unique solution to this problem, and all of cert-manager is working at full capacity for me now. My setup was:
To fix, for some reason I had to make an adjustment to the calico network IP pool configuration away from the default. I downloaded the calico setup YAML (https://docs.projectcalico.org/manifests/calico.yaml), and then I edited this snippet
to
After deleting the default created IP pool and restarting calico, I reinstalled cert-manager and it began working as intended.
I am not sure exactly why this change fixed all my problems.
I have been dealing with this issue for a couple days now. After the 0.15.0 alpha came out today I thought this issue would be resolved, but I continue to suffer the same issue.
Also I don’t think @Antiarchitect 's solution is actually a real solution since it necessitates deletes the webhook configurations, effectively disabling the webhook service. I think the issue is TLS connection establishment related, but I am not sure why none of the ciphers work.
Same here having upgraded from v0.11 to 0.14.1. Mandatory webhook component seems to have borked. Our new webhook pod is accessible on
cert-manager-webhook.our-namespace.svc:443
and I’ve tried thehostNetwork
suggestion and waiting for the pod to come up before creating the clusterIssuer resource. No dice. Rolling back to < v0.14 until all the open issues about this are closed. May I suggest a patch to make webhook optional again in the meantime?