ingress-nginx: ingress-nginx 4.4.3 adds duplicated location block

What happened:

generated nginx config adds location / blocks for some cases twice, config does not pass validation and ingress stucks in crash loop. It found this only for 2 domains (ingresses), one of them have basic auth and letsencrypt tls, and the other just letsencrypt (most basic setup), other similar ingresses were not affected

Reverting to 4.4.2 resolves issue

        Error: exit status 1
        2023/02/01 19:49:45 [warn] 34#34: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg587039469:144
        nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg587039469:144
        2023/02/01 19:49:45 [warn] 34#34: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg587039469:145
        nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg587039469:145
        2023/02/01 19:49:45 [warn] 34#34: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg587039469:146
        nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg587039469:146
        2023/02/01 19:49:45 [warn] 34#34: could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
        nginx: [warn] could not build optimal proxy_headers_hash, you should increase either proxy_headers_hash_max_size: 512 or proxy_headers_hash_bucket_size: 64; ignoring proxy_headers_hash_bucket_size
        2023/02/01 19:49:45 [emerg] 34#34: duplicate location "/" in /tmp/nginx/nginx-cfg587039469:851
        nginx: [emerg] duplicate location "/" in /tmp/nginx/nginx-cfg587039469:851
        nginx: configuration file /tmp/nginx/nginx-cfg587039469 test failed

What you expected to happen: It should not crash

NGINX Ingress controller version (exec into the pod and run nginx-ingress-controller --version.):

NGINX Ingress controller
  Release:       v1.6.1
  Build:         1bf5317969fd0c91e11added92aa649ba68fd64d
  Repository:    https://github.com/kubernetes/ingress-nginx
  nginx version: nginx/1.21.

Kubernetes version (use kubectl version):

Server Version: version.Info{Major:"1", Minor:"26", GitVersion:"v1.26.0", GitCommit:"b46a3f887ca979b1a5d14fd39cb1af43e7e5d12d", GitTreeState:"clean", BuildDate:"2022-12-08T19:51:45Z", GoVersion:"go1.19.4", Compiler:"gc", Platform:"linux/amd64"}

Environment:

  • How was the ingress-nginx-controller installed:
    • If helm was used then please show output of helm ls -A | grep -i ingress
    • If helm was used then please show output of helm -n <ingresscontrollernamepspace> get values <helmreleasename>
    • If helm was not used, then copy/paste the complete precise command used to install the controller, along with the flags and options used
    • if you have more than one instance of the ingress-nginx-controller installed in the same cluster, please provide details for all the instances
ingress-nginx           ingress-nginx   9               2023-02-01 19:45:33.6887662 +0000 UTC   deployed        ingress-nginx-4.4.3             1.6.1       
values 
    controller:
      metrics:
        enabled: false
      hostNetwork: true
      hostPort:
        enabled: true
      watchIngressWithoutClass: true
      kind: DaemonSet
      service:
        external:
          enabled: false
      admissionWebhooks:
        timeoutSeconds: 30
    config:
      bind-address: <...>
    tcp: <...>
    udp: <...>

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 53
  • Comments: 63 (30 by maintainers)

Commits related to this issue

Most upvoted comments

The chart we released by accident and was not ready for consumption; please pin your charts to 4.4.2.

@strongjz could you please delete the faulty chart version 4.4.3 from the Helm repo index? Most Flux users have patch upgrades enabled and this broke lots of clusters.

The chart we released by accident and was not ready for consumption; please pin your charts to 4.4.2. I apologize for this confusion and the issues this caused. The CI was updated to add linting and testing before a release/push to the main branch. And I agree the ingress-controller version was bumped to 1.6.1 but only a minor on the chart.

The implementation-specific changes should also have the path validation boolean on them.

Please don’t release this as a minor chart update

Looks like related to #9543

It looks like you are correct Both my broken ingress classes had pathType: ImplementationSpecific and changing it to Prefix resolved the issue

Same happened to me. My helm values are here.

just in case somebody else needs it - here’s how to override prefix type for kubernetes-dashboard chart:

ingress:
  enabled: true
  annotations:
    dns.alpha.kubernetes.io/external: "true"
    cert-manager.io/cluster-issuer: self-signed
  className: nginx-private
  hosts:
    - dashboard.example.com
  tls:
    - secretName: kubernetes-dashboard-tls
      hosts:
        - dashboard.example.com
  customPaths:
    - path: /
      pathType: Prefix
      backend:
        service:
          name: kubernetes-dashboard
          port:
            number: 443
tao@moelove:~$ helm repo update ingress-nginx 
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "ingress-nginx" chart repository
Update Complete. ⎈Happy Helming!⎈
tao@moelove:~$ helm search repo ingress-nginx
NAME                            CHART VERSION   APP VERSION     DESCRIPTION                                       
ingress-nginx/ingress-nginx     4.4.2           1.5.1           Ingress controller for Kubernetes using NGINX a...

  • removed 4.4.3 Helm chart.

1.6.2 is building, once its completed, we move it to production k8s registry then we can release a new controller and chart

I have three ingress classes (all nginx ingresses) on my home lab cluster, only one was impacted by this.

Would anyone be able to test with v1.6.2 in the environment where the problem happens? This can also avoid subsequent issues, thank you!

I simply updated the referenced image in my existing Deployment from 1.5.1 to 1.6.2. Same issue as before. Still ends in

controller 2023/02/04 07:41:43 [emerg] 29#29: duplicate location "/" in /tmp/nginx/nginx-cfg2801467529:878

We’re going to remove path validation, I’ve tested both

https://github.com/kubernetes/ingress-nginx/pull/9543

https://github.com/kubernetes/ingress-nginx/pull/9511

They both cause a duplicate / location error. We will have to investigate why it does this before releasing it.

After talking with @tao12345666333 and @rikatz 1.6.3 will be CVE fixes and other changes.

Thank you for testing and your patience while we release and secure ingress-nginx.

in my case it was

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    cert-manager.io/cluster-issuer: letsencrypt
  name: example
spec:
  rules:
  - host: example.com
    http:
      paths:
      - backend:
          service:
            name: nginx
            port:
              name: http
        path: /
        pathType: ImplementationSpecific
  tls:
  - hosts:
    - example.com
    secretName: example.com-tls

Longhorn at least seems to be using ImplementationSpecific in the UI ingress.

TLDR: only ingresses with ImplementationSpecific type were affected.

I have two controllers and classes: public and private (eks and alb controller).

So only few of private ingresses were set following:


        path: /
        pathType: ImplementationSpecific

And all public had type Prefix. As a result of chart auto update (which I guess I really do need to disable) - public controller updated just fine but private failed to update because new pod went into crash loop with error described in this ticket. Fix was to manually change all ingresses to Prefix and restart pod and update again.

Haven’t tested though what would happen if path was set to smth like /* - I believe that’s the intended usage and valid case for ImplementationSpecific?

Thank you for confirming and working on remediations. #HugOps

Breaking changes aside, can someone explain why it’s 1.6.1 instead of 1.6.0? What happened to it? Do we just jump like this now? 1.4.0 => 1.5.1 => 1.6.1? 🤔

Also, Helm chart should’ve bumped from 4.4.2 => 4.5.0 or something? Not 4.4.3. At least respecting new minor version of the controller…

1.5.0 would have path validation enabled by default.

1.6.0 disabled it by default.

1.6.1 missing implementation-specific checks.

1.6.2 fixed that.

We can’t delete images once they are promoted in the kuberenetes registry, so our only option is to roll forward.

4.4.2 to 4.5.0 was to indicate the change and not do auto rollouts for gitops folks.

If we do a major bump of the controller we will also do a major bump of the chart. Same for minors.

I hope this clears it up.

4.4.3 is now gone? Good, the values yaml had #9579

Would anyone be able to test with v1.6.2 in the environment where the problem happens? This can also avoid subsequent issues, thank you!

https://github.com/kubernetes/ingress-nginx/pull/9575

For the convenience of testing, I have packaged the Helm chart in this PR and can download it directly. Or check out the code used in this PR.

https://drive.google.com/drive/folders/1I9m63h1B6FivCkcqQesJD0ChY5RcQpZ_?usp=share_link

already checked it, its been for the kubernetes-dashboard as @tbondarchuk said. Is the only one with ImplementationSpecific, my apologies for the confusion.

@strongjz https://gist.github.com/tbondarchuk/cc5ff7111871c0a1e2a273fb36b069b2

@ricosega your ingress are set to port name, mine to port number. perhaps issue is triggered by port.name + Prefix and by ImplementationSpecific ?

P.S. Edit: tagging correct user

This is indeed a quick fix. Many people can be prevented from being affected.

I’ll start working on it

@tbondarchuk not only ingresses with ImplementationSpecific type were affected, all mine are Prefix and also crashes as I said before.

@strongjz this is my values.yaml config

    controller:
      kind: DaemonSet
    metrics:
      enabled: true
      serviceMonitor:
        enabled: false
        additionalLabels:
          release: kube-prometheus-stack
    podAnnotations:
      prometheus.io.scrape: 'true'
      prometheus.io.port: '10254'
    service:
      type: NodePort
    resources:
      limits:
        cpu: 200m
        memory: 200Mi
      requests:
        cpu: 200m
        memory: 200Mi
    config:
      client-body-buffer-size: "100M"
      client-header-timeout: "3600"
      client_body_timeout: "3600"
      enable-underscores-in-headers: "true"
      keep-alive: "65"
      proxy-body-size: "100M"
      proxy-buffering: "off"
      proxy-read-timeout: "3600"
      proxy-redirect: "off"
      proxy-redirect-from: "off"
      proxy-send-timeout: "3600"
      server-name-hash-bucket-size: "256"
      server-name-hash-max-size: "512"
      server-snippet: |
        add_header X-Frame-Options DENY;
        add_header X-Content-Type-Options 'nosniff';
        add_header X-XSS-Protection '1; mode=block' ;
        add_header Strict-Transport-Security 'max-age=31536000; includeSubDomains; preload';
        add_header Content-Security-Policy 'frame-ancestors \'self\' https://*.s3.eu-west-1.amazonaws.com;';
      server-tokens: "false"
      use-forwarded-headers: "true"
      use-proxy-protocol: "false"

so the rest is by the default values and the controller is enabled in my case:

$ k -n kube-system get ValidatingWebhookConfiguration ingress-nginx-admission
NAME                      WEBHOOKS   AGE
ingress-nginx-admission   1          17d

and here is the error log:

I0202 12:55:18.122134       7 event.go:285] Event(v1.ObjectReference{Kind:"Pod", Namespace:"ingress-nginx", Name:"ingress-nginx-controller-r9sb5", UID:"d54a11c3-14ea-474c-843e-c40891f04952", APIVersion:"v1", ResourceVersion:"34418200", FieldPath:""}): type: 'Warning' reason: 'RELOAD' Error reloading NGINX: 
-------------------------------------------------------------------------------
Error: exit status 1
2023/02/02 12:55:16 [warn] 126#126: the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg352643530:150
nginx: [warn] the "http2_max_field_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg352643530:150
2023/02/02 12:55:16 [warn] 126#126: the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg352643530:151
nginx: [warn] the "http2_max_header_size" directive is obsolete, use the "large_client_header_buffers" directive instead in /tmp/nginx/nginx-cfg352643530:151
2023/02/02 12:55:16 [warn] 126#126: the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg352643530:152
nginx: [warn] the "http2_max_requests" directive is obsolete, use the "keepalive_requests" directive instead in /tmp/nginx/nginx-cfg352643530:152
2023/02/02 12:55:16 [emerg] 126#126: duplicate location "/" in /tmp/nginx/nginx-cfg352643530:6080
nginx: [emerg] duplicate location "/" in /tmp/nginx/nginx-cfg352643530:6080
nginx: configuration file /tmp/nginx/nginx-cfg352643530 test failed

-------------------------------------------------------------------------------
I0202 12:55:19.092376       7 controller.go:188] "Configuration changes detected, backend reload required"

The E2E for this is on in the admission controller, so if folks have that disabled it would cause issues. We may need to add checks in the controller so it doesnt actually crash.

I faced a race condition between the admission controller and the pods that were crashing (as it wasn’t able to process the validations due to crashing pods). The error log on the crashed pods didin’t tell me what the issue is so it would be good if it also included the ingress name that’s causing it to crash.

TLDR: only ingresses with ImplementationSpecific type were affected.

I would have sworn up and down that I only had Prefix types. Alas, my affected staging cluster:

[17:27:27][cayla@wopr]% k get ing -A -o yaml | grep pathType: | sort | uniq -c
   1           pathType: ImplementationSpecific
 627           pathType: Prefix

Must be some odd 3rd party tool that snuck it in there 😠

EDIT: it was logstash in my case.

I do have a question for those who are crashing, do you have the admission controller turned on? It should validate the path for those using implementation specific.

In my case, it was on (at least it should be as I didn’t explicitly have it disabled in the helm values). It looks like it has been out there for a while, too:

❯ k -n kube-system get ValidatingWebhookConfiguration ingress-nginx-admission
NAME                      WEBHOOKS   AGE
ingress-nginx-admission   1          2y78d

all my ingresses are like the following one:

apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: api
spec:
  ingressClassName: nginx
  rules:
  - host: api.example.com
    http:
      paths:
      - backend:
          service:
            name: api
            port:
              name: http-main
        path: /
        pathType: Prefix

well, I think this chart version shouldn’t have been created yet because of the many changes inside: https://github.com/kubernetes/ingress-nginx/commit/d80d4d4eca42ff9ec6ad231f187df454eec85321 version changed from 1.5.1 to 1.6.1 and more…

Affected by this as well… However all my pathTypes are set to Prefix. Going back to 4.4.2 for now.