aws-load-balancer-controller: aws-load-balancer-webhook-service error for non alb Ingresses

We are trying to migrate from ingress-nginx to aws-load-balancer-controller. We are starting by just installing the controller chart. The plan is to template our applications to use the new ingress.class alb and then migrate them.

But after installing aws-load-balancer-controller, we are seeing errors on our existing applications like:

cannot patch "app1-ingress" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca"): cannot patch "app1-ingress" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

app1-ingress still uses kubernetes.io/ingress.class: nginx. Can we skip the webhook from modifying those?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 35
  • Comments: 51

Most upvoted comments

Reasonably confident this is the same as #2239. https://github.com/kubernetes-sigs/aws-load-balancer-controller/blob/2e181b2cf31d41ce812deb18da629a5b0631144e/helm/aws-load-balancer-controller/values.yaml#L137

We have set keepTLSSecret: true. Our GitOps pipeline that uses helmfile sync has executed a dozen times after the change and we have yet to see this issue.

It is interesting, today faced with this issue while deployment of helm chart with ingress with

annotations:
    kubernetes.io/ingress.class: alb
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:xxxxxxx:xxxxxxxx:certificate/xxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxx

and in logs Error: UPGRADE FAILED: cannot patch "helm-chart-name" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

After retry, it is fixed as many times before.

So possible issue not only related to two nginx and alb ingress controllers we have in cluster, but also with aws acm certificate check?

I have also had this issue when using Polaris https://artifacthub.io/packages/helm/fairwinds-stable/polaris on version 4.0.4

Error: UPGRADE FAILED: cannot patch "polaris" with kind Ingress: Internal error occurred: failed calling webhook "vingress.elbv2.k8s.aws": Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/validate-networking-v1beta1-ingress?timeout=10s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "aws-load-balancer-controller-ca")

We use cert-manager as detailed with the Polaris documentation and currently running EKS 1.20

I also ran into this today across multiple clusters. The TLS secret has not changed since deployment as far as I can tell and was not regenerated. Did anyone have any idea whats wrong?

As far as I can tell the TLS secret in the cluster from the helm chart should be valid for 10 years.

We are installing it through helm via ArgoCD. We set keepTLSSecret: True yet we periodically see the issue

I face this issue on a regular basis while applying new/updating nginx ingress configured to use LB via AWS load balancer controller. The only solution I found is to delete all resources of AWS lb controller and reinstall. (I usually do it though argocd, so it’s pretty quick reinstallation)

Update, I found that when I remove the annotation alb.ingress.kubernetes.io/target-type: ip this error ceases to occur

I do not have this annotation at all but still facing the problem.

So I think I found my issue, and possibly what others here are seeing.

@DmitriyStoyanov was close in saying

Looks like web hook validation do not check annotations such as kubernetes.io/ingress.class or ingressClass in Ingress spec

But maybe the key here is that the Ingress spec uses ingressClassName not ingressClass.

The documentation https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.2/guide/ingress/ingress_class/#deprecated-kubernetesioingressclass-annotation looks to be specific to the annotation for alb itself still being supported, but if you’re using a secondary ingress controller (nginx or traefik) the annotation for those will cause the error reported above.

So the “fix” is to update any Ingress objects that are using the annotation style like kubernetes.io/ingress.class: traefik to using the new IngressClass spec like:

apiVersion: networking.k8s.io/v1beta1
kind: Ingress
metadata:
  name: default-backend-traefik
  namespace: ingress
spec:
  ingressClassName: traefik

and also define your IngressClass object.

apiVersion: networking.k8s.io/v1beta1
kind: IngressClass
metadata:
  name: traefik
spec:
  controller: traefik.io/ingress-controller

If I had to guess, the AWS LB controller is not checking the annotations for ingress class, and therefore defaulting your Ingress objects for traefik to be the ALB class and causing the error.

@omidraha I had the same issue but solved it by using a k8s.helm.v3.Release instead of a k8s.helm.v3.Chart

Still happening, EKS, installed via helm version 1.5.3

I’ve fixed the problem by generating a cert manually and set it via webhookTLS. Using the certManager integration also works, both fix the problem for me. I’ve seen a few other charts that generate TLS certs via webhook and all of them have a similar problems

Still happening. aws-lb-controller version is latest 2.4.1 and keepTLSSecret is true.

@kishorj I’ve created a ticket with AWS support. CaseID 9282832861

Ran into this today. Fixed by pruning all related resources and re-installing.

@DmitriyStoyanov, @gazal-k, @darrenwhighamfd if you are able to reproduce the issue, could you please create a support ticket with AWS support with cluster ARN?