ingress-nginx: Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io"
Hi all,
When I apply the ingress’s configuration file named ingress-myapp.yaml by command kubectl apply -f ingress-myapp.yaml
, there was an error. The complete error is as follows:
Error from server (InternalError): error when creating “ingress-myapp.yaml”: Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io”: Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded
This is my ingress:
apiVersion: extensions/v1beta1
kind: Ingress
metadata:
name: ingress-myapp
namespace: default
annotations:
kubernetes.io/ingress.class: "nginx"
spec:
rules:
- host: myapp.magedu.com
http:
paths:
- path:
backend:
serviceName: myapp
servicePort: 80
Has anyone encountered this problem?
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 290
- Comments: 175 (37 by maintainers)
Links to this issue
Commits related to this issue
- rollback nginx ingress https://github.com/kubernetes/ingress-nginx/issues/5401 — committed to onedr0p/home-ops by onedr0p 4 years ago
- fix(eks-public): add ingress so the cluster can communicate with the ingress controller Fix following error when deploying an exposed service in eks-public: > Error: release artifact-caching-proxy... — committed to jenkins-infra/aws by lemeurherve 2 years ago
@aduncmj I found this solution https://stackoverflow.com/questions/61365202/nginx-ingress-service-ingress-nginx-controller-admission-not-found
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
I might have solved it…
I followed this guide for the helm installation: https://kubernetes.github.io/ingress-nginx/deploy/
But when I followed this guide instead: https://docs.nginx.com/nginx-ingress-controller/installation/installation-with-helm/
The error doesn’t occur.
If you have this issue try it out by deleting your current helm installation.
Get the name:
Delete and apply stable release:
@johan-lejdung not really, that is a different ingress controller.
Hi,
I have.
The validatingwebhook service is not reachable in my private GKE cluster. I needed to open the 8443 port from the master to the pods. On top of that, I then received a certificate error on the endpoint “x509: certificate signed by unknown authority”. To fix this, I needed to include the caBundle from the generated secret in the validatingwebhookconfiguration.
A quick fix if you don’t want to do the above and have the webhook fully operational is to remove the validatingwebhookconfiguration or setting the failurePolicy to Ignore.
I believe some fixes are needed in the deploy/static/provider/cloud/deploy.yaml as the webhooks will not always work out of the box.
Why close this issue? What is the solution?
I fixed this by using:
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
@eltonbfw update to 0.32.0 and make sure the API server can reach the POD running the ingress controller
On EKS, a security group rule needs to be added on the Node Security Group tcp/8443 from the Cluster Security Group.
Solution: delete your ValidatingWebhookConfiguration
kubectl get -A ValidatingWebhookConfiguration
NAMEnginx-ingress-ingress-nginx-admission
kubectl delete -A ValidatingWebhookConfiguration nginx-ingress-ingress-nginx-admission
@aledbf can you please reopen this issue? A huge number of people are having the same problem, so this issue definitely isn’t resolved. The instructed solution isn’t clear in either the documentation nor the issue comments.
I’m seeing the most common reply here is “turn off webhook validation”, but turning off validation doesn’t mean the error has gone away, just that it’s no longer being reported.
I have the same problem,and i use 0.32.0. What’s the solution? Pleast, thanks!
In case using terraform:
Hi, I am a beginner in setting a k8s and ingress. I am facing a similar issue. But more in a baremetal scenario. It would be very grateful if you can please share more details on what you mean by ‘opening a port between master and pods’?
Update: sorry, as I said, I am new to this. I checked there is a service (ingress-nginx-controller-admission) which is exposed to node 433 running from the ingress-nginx namespace. And for some reason my ingress resource trying to run from default namespace is not able to communicate to it. Please suggest on how I can resolve this.
error is :
Error from server (InternalError): error when creating "test-nginx-ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded
I added a note about the webhook port in https://kubernetes.github.io/ingress-nginx/deploy/ and the links for the additional steps in GKE
I was able to find cause of my issue. Fresh EKS was built using the latest community supported terraform-eks v18.2 In the v18 maintainers made sg rules much more stricter and allowing only specific k8s ports communications.
Allowing all traffic from masters to workers node made things work as intended. (well for this ingress you at least need to allow port 8443)
A quick update on the above, the certificate error should be managed by the patch job that exists in the deployment so that part should be a non-issue. Only the port 8443 needed to be opened from master to pods for me.
i still have the problem
update
i disable the webhook, the error go away
fix workaround
helm install my-release ingress-nginx/ingress-nginx
–set controller.service.type=NodePort
–set controller.admissionWebhooks.enabled=false
Caution!!! it’s may not resolve the issue properly.
now status
–set controller.service.type=NodePort
exec kubectl get svc,pods
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/a-service ClusterIP 10.105.159.98 <none> 80/TCP 28h service/b-service ClusterIP 10.106.17.65 <none> 80/TCP 28h service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 3d4h service/my-release-ingress-nginx-controller NodePort 10.97.224.8 <none> 80:30684/TCP,443:32294/TCP 111m service/my-release-ingress-nginx-controller-admission ClusterIP 10.101.44.242 <none> 443/TCP 111m
NAME READY STATUS RESTARTS AGE pod/a-deployment-84dcd8bbcc-tgp6d 1/1 Running 0 28h pod/b-deployment-f649cd86d-7ss9f 1/1 Running 0 28h pod/configmap-pod 1/1 Running 0 54m pod/configmap-pod-1 1/1 Running 0 3h33m pod/my-release-ingress-nginx-controller-7859896977-bfrxp 1/1 Running 0 111m pod/redis 1/1 Running 1 6h11m pod/test 1/1 Running 1 5h9m
my ingress.yaml
apiVersion: networking.k8s.io/v1beta1 kind: Ingress metadata: annotations: kubernetes.io/ingress.class: nginx name: example
namespace: foo
spec: rules: - host: b.abbetwang.top http: paths: - path: /b backend: serviceName: b-service servicePort: 80 - path: /a backend: serviceName: a-service servicePort: 80
tls: - hosts: - b.abbetwang.top
what I Do
when i run kubectl apply -f new-ingress.yaml i got Failed calling webhook, failing closed validate.nginx.ingress.kubernetes.io:
my apiserver log blow:
I0504 06:22:13.286582 1 trace.go:116] Trace[1725513257]: “Create” url:/apis/networking.k8s.io/v1beta1/namespaces/default/ingresses,user-agent:kubectl/v1.18.2 (linux/amd64) kubernetes/52c56ce,client:192.168.0.133 (started: 2020-05-04 06:21:43.285686113 +0000 UTC m=+59612.475819043) (total time: 30.000880829s): Trace[1725513257]: [30.000880829s] [30.000785964s] END W0504 09:21:19.861015 1 watcher.go:199] watch chan error: etcdserver: mvcc: required revision has been compacted W0504 09:31:49.897548 1 watcher.go:199] watch chan error: etcdserver: mvcc: required revision has been compacted I0504 09:36:17.637753 1 trace.go:116] Trace[615862040]: “Call validating webhook” configuration:my-release-ingress-nginx-admission,webhook:validate.nginx.ingress.kubernetes.io,resource:networking.k8s.io/v1beta1, Resource=ingresses,subresource:,operation:CREATE,UID:41f47c75-9ce1-49c0-a898-4022dbc0d7a1 (started: 2020-05-04 09:35:47.637591858 +0000 UTC m=+71256.827724854) (total time: 30.000128816s): Trace[615862040]: [30.000128816s] [30.000128816s] END W0504 09:36:17.637774 1 dispatcher.go:133] Failed calling webhook, failing closed validate.nginx.ingress.kubernetes.io: failed calling webhook “validate.nginx.ingress.kubernetes.io”: Post https://my-release-ingress-nginx-controller-admission.default.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded
Would you please reopen this issue @aledbf ?
@andrei-matei Kelsey’s cluster works perfectly even without additional CNI plugins and kubelet SystemD services installed on master nodes. All you need is to add a route to Services’ CIDR
10.32.0.0/24
using worker node IPs as “next-hop” on master nodes only. In this way I’ve gotingress-nginx
(deployed from “bare-metal” manifest) andcert-manager
webhooks working, but unfortunately not together 😦 still doesn’t know why…Updated: got both of them working
I have the same issue, baremetal install with CentOS 7 worker nodes.
This works for me, after hours searching and this resolved, thanks!
Update on 2020-10-07
In my scenario, the problem is caused by custom CNI plugin weave-net, which makes the API server not able to reach the overlay network. The solution is either using the EKS default CNI plugin, or adding
hostNetwork: true
to theingress-nginx-controller-admission
Deployment spec. But the latter has some other issues that one needs to care about.----------------Original comment----------------
Removing the
ValidatingWebhookConfiguration
only disable the validation. Your ingress may get persisted, but once your ingress has some configuration error, you nginx ingress controller will be doomed.I don’t think the PathType fix 5445 has something to do with this error. It says
Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: context deadline exceeded
which IMHO tells that the ingress admission service cannot be reached from the control plane(8443 port is the default port exposed from pod, and 443 is the service exposed for the pod/deployment).
I’m encountering this error in AWS EKS, K8S version 1.17. It occured to me this might have something to do with security group settings. But I tried every possible way to make sure the control plane can reach the worker node on any port, but still the problem cannot be resolved. 😞
This is still an issue - using version:
v0.35.0
.I don’t think you need both an ingress and an egress rule but just the ingress one. The first of these two rules should be enough.
For anyone using the
terraform-aws-modules/eks/aws
module, you can add this to your configuration:This way is just concealing the issue, who can provide a best way? I still facing this issue after closing the firewall:
Got this in GCP on our testing environment (chart 3.23.0 / image 0.44.0 / k8s 1.17.14-gke.1600):
Our production runs on chart 3.13.0 / image 0.41.2 where this cannot be reproduced.
As a workaround:
In my GKE cluster I’ve manually increased
timeoutSeconds
to 30.You can do it via Helm:
Hey guys, I’ve also experienced this issue. After doing some debugging it seems like the admission controller just takes too long to respond. Since the webhook timeout is 10s, it means that (in my case) the ingress validation check (which internally, constructs the whole tobe nginx config) takes longer than 10s and hence the timeout or in this case
deadline exceeded
. Again I don’t have concrete evidence to back this statement up, I need to do some timings to really find out… my suspicion is that the pod and thus the container has very little resources to carry out the required config generation in a timely manner - again assumptions.Workaround: increasing the timeout of the
validatingwebhookconfiguration
for the ingress controllerFor me the “kubectl delete validatingwebhookconfigurations public-nginx-ingress-admission” also work around … but i think this is just a bad work around . This don’t get what the root is and how i can solve it durable.
There is another yet closed issue related https://github.com/kubernetes/ingress-nginx/issues/6655
The solution from vosuyak worked for me, using
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
when currently using the namespace where I’m applying the ingress rules.See https://stackoverflow.com/a/62044090/1549918
I updated from nginx-ingress to ingress-nginx in GKE, so if this helps anyone I needed to add a FW rule to allow 8443 from the API server to my nodes.
As per deploy instructions: https://kubernetes.github.io/ingress-nginx/deploy/#gce-gke
I’m not sure why it was NOT needed in nginx-ingress.
For me, it helped to solve the problem. My cluster had two ValidatingWebhookConfiguration (due to one wrong installation) and deleting the outdated one solved the issue. Thank you.
We have just 3 pods and have the same issue. I don’t think is at all related to the number of pods.
I think we’re encountering a similar issue
@rkevin-arch please make sure you are using the latest version v0.41.2. There was a regression that denied validation of ingresses
networking.k8s.io/v1
.I’ve tried to upgrade from the deprecated helm chart
stable/nginx-ingress
toingress-nginx/ingress-nginx
(app version 0.35.0) and my ingress deployment crashes with:I had a similar problem (but with “connection refused” rather than “context deadline exceeded”, as reason).
The solution of @lbs-rodrigo, deleting the ingress so that it can be recreated according to the config, with
kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission
, fixed my problem. If your configuration is correct, then give it a try.@aledbf Same error. Bare-metal installation.
Just a heads up here: when the protocol is set to “-1”, it means “All Traffic”. This opens up all ports, making the from_port/to_port values moot. This may be too permissive in some cases. Setting to"tcp" will allow you to limit/set the port range to 8443.
Having had the same issues noted above and finding this solution, I found the rule wasn’t what I was expecting. Had troubles finding the rule because I was searching by port.
@Clasyc and everyone also:
On EKS created from terraform-aws-modules/eks/aws module (version 17.x though) a security group is automatically created by the module itself, for the Worker Nodes that has a rule which allows traffic from the Control Plane security group on ports 1025-65535 for TCP.
This rule also includes the pre-defined description “Allow worker pods to receive communication from the cluster control plane”.
Does this not cover the case of the security group mentioned above?
If it does, I am still facing this issue but intermittently, especially when I am deploying massive workloads through Helm (the Ingresses have been checked and are OK as far as their correctness is concerned). It almost seems like a flood-protection mechanism because if I let it cooldown then I don’t get it anymore.
Am I missing something here?
@mihaigalos is the global configmap. you can apply it when you install ingress via helm. like this
helm install ingress ingress-nginx/ingress-nginx -f values.yaml
values.yaml:
Can you post information and command and outputs that shows that the securitygroups or host os packet-filtering is not blocking the required ports
I’ve got very same issue on clean and fresh EKS 1.21 install without any addons, CNI, NetworkPolicies, firewalls, etc. Same nginx-ingress is working on my test k3d setup. I’ve tested couple more recent older version of ingress-controller - none worked on EKS. Increasing request timeout doesn not help. Removing ValidatingWebhookConfiguration helps.
But IMHO that’s not normal to just delete something to get it working. I can’t find exact root cause of the issue in any threads for that problem either.
Why pods under the same namespace are not able to communicate with ingress-controller admission webhook?
It seems there are multiple errors for "failed calling webhook “validate.nginx.ingress.kubernetes.io”: Post “https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=10s”: <here>
Like below:
The one I’m facing as soon as I apply the ingress resource file (rules file) is: Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io”: Post “https://ingress-nginx-controller-admission.ingress-nginx.svc:443/networking/v1/ingresses?timeout=30s”: EOF
After which the ingress controller gets restarted as below: NAME READY STATUS RESTARTS AGE ingress-nginx-controller-5cf97b7d74-zvrr6 1/1 Running 6 30m ingress-nginx-controller-5cf97b7d74-zvrr6 0/1 OOMKilled 6 30m ingress-nginx-controller-5cf97b7d74-zvrr6 0/1 CrashLoopBackOff 6 30m ingress-nginx-controller-5cf97b7d74-zvrr6 0/1 Running 7 31m ingress-nginx-controller-5cf97b7d74-zvrr6 1/1 Running 7 32m
One possible solution could be (not sure though) mentioned: https://stackoverflow.com/a/69289313/12241977
But not sure it could possibly work in case of Managed Kubernetes services like AWS EKS as we don’t have access to kube-api server.
Also the section “kind: ValidatingWebhookConfiguration” has below field from yaml: https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.1.0/deploy/static/provider/baremetal/deploy.yaml
So where does the “path: /networking/v1/ingresses” do & where it is reside or simply where we can find this path?
@shakaib-arif friendly reminder for answering the question of @strongjz 😃
Environment Detail:
v1.21.2
Error Log:
My Resolution
In my AKS cluster, I have increased the timeout to
timeoutSeconds: 30
.Thanks @tehKapa, for your comment it saved my day #5401 (comment)
kubectl log:
ingress.networking.k8s.io/ingressName configured
Ingress log:
Stumbled upon this issue with controller: v0.48.1. Solved by rolling back kube-webhook-certgen to v1.2.2 (the problem was on version 1.5.1)
Agree with that. By setting the ca from the
nginx-ingress-controller-ingress-nginx-admission
secret in the caBundle field of the ValidatingWebhookConfiguration, it works.Why this field is not set by default during the
nginx-ingress-controller-ingress-nginx-admission-create
Job ? @aledbfHow i resolve this issue by 1.kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission 2. kubectl get job -n ingress-nginx 3. kubectl delete job ingress-nginx-admission-create ingress-nginx-admission-patch -n ingress-nginx 4. re-deploy webhook stack (ValidatingWebhookConfiguration , ClusterRole , ClusterRoleBinding , Job ,role , RoleBinding) 5. wait for pod job complated.
ingress-nginx-admission-create-tkfch 0/1 Completed 0 3m56s
ingress-nginx-admission-patch-fwc86 0/1 Completed 0 3m56s
6. deploy ingresskubectl apply -f echo-sever.txt
ingress.networking.k8s.io/echo created
Noted this step work on ingress ctl v.0.34.1
@vagdevik
Thanks, mate!!!
That worked for me. I put a more detailed answer here on StackOverflow
I did more digging and it seems the problem is due to the amount of ingresses we have. We have 219, so I think when it validates it checks existing ones as well causing it to fail intermittently when it cannot check all objects and it has no builtin retries on failure.
Dang, that makes sense, hope some AWS expert would notice this issue … By default EKS pods runs on same subnet as nodes, which makes them routable within VPC. But I’m using Cilium CNI plugin and pods now has 10.0.0.0/8 IP range. Maybe this could be causing some mess… Maybe not.
Is admission webhook process running on port 8443 of node or pod?
@renanrider As others already pointed out, you should resolve network issues rather than disabling webhooks. Disabling admission webhook is bad idea.
It turned out that my helm chart values was incorrect. I set
hostNetwork: true
, which effectively disables access to admission webhook.To be able to use admission webhooks with
hostNetwork: true
, you need to open port 8443 of node as well, I guess, but I don’t think that’s a good idea.If you what you need is just exposing port 80 and 443 (but not 8443), you can use port mapping instead of hostNetwork. This way admission webhooks remain only accessible inside cluster, which is better than exposing port 8443 of node.
Its not solving the problem killing the webhook - youn need the webhook for an working cluster.
Maybe that is the way. How you proposed to do that?
That’s not a solution, that’s destroying the functionality of the software that caused the root issue 😃
As mentioned previously, the solution is to allow admission webhook port 8443 from master to worker nodes. On private GKE clusters firewall rule should be
gke-<cluster_name>-<id>-master
with target tagsgke-<cluster_name>-<id>-node
, source range - your master CIDR block and TCP ports 10250, 443 by default.This error means that the Kubernetes API Server can’t connect to the admission webhook (a workload running inside the Kubernetes cluster).
Solution for GKE is actually perfectly documented: https://cloud.google.com/kubernetes-engine/docs/how-to/private-clusters#console_9. Just create a firewall rule to allow API Server -> workload traffic.
For other Kubernetes deployments try to login to the API Server host and connect to the provided URL yourself. If it doesn’t work, figure out routing, firewalls and name resolution.
kubectl apply -f ingress-single.yaml --kubeconfig=/home/mansaka/softwares/k8sClusteryaml/kubectl.yaml worked for me
hello, i use version 0.30 to solve this problem,hah
------------------ 原始邮件 ------------------ 发件人: “kubernetes/ingress-nginx” <notifications@github.com>; 发送时间: 2020年9月9日(星期三) 凌晨1:47 收件人: “kubernetes/ingress-nginx”<ingress-nginx@noreply.github.com>; 抄送: “小伙子很皮啊”<271138425@qq.com>;“Comment”<comment@noreply.github.com>; 主题: Re: [kubernetes/ingress-nginx] Internal error occurred: failed calling webhook “validate.nginx.ingress.kubernetes.io” (#5401)
so, what’s the solutions?
I had a similar problem (but with “connection refused” rather than “context deadline exceeded”, as reason).
The solution of @lbs-rodrigo, deleting the ingress so that it can be recreated according to the config, with kubectl delete -A ValidatingWebhookConfiguration ingress-nginx-admission, fixed my problem. If your configuration is correct, then give it a try.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.
If you are using the baremetal install from Kelsey Hightower, my suggestion is to install kubelet on your master nodes, start calico/flannel or whatever you use for CNI, label your nodes as masters so you have no other pods started there and then your control-plane would be able to communicate with your nginx deployment and the issue should be fixed. At least this is how it worked for me.