ingress-nginx: Breaking change in helm chart 4.4.1 / controller 1.5.2
As soon as we automatically upgraded to 4.4.1, many of our services were inaccessible. Looked for common denominator, and discovered that Ingress definitions that use multiple paths/prefixes broke in strange and bizarre ways.
They would serve fake cert instead of cert-manager brokered cert, and if we bypassed the fake cert with e.g. curl --insecure, they would nginx 404 on all paths/prefixes.
Here is a sample broken Ingress (identifying information redacted):
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: test-api
namespace: test-production
annotations:
kubernetes.io/ingress.class: nginx
cert-manager.io/cluster-issuer: letsencrypt-prod
nginx.ingress.kubernetes.io/rewrite-target: /$1
spec:
tls:
- hosts:
- test.com
secretName: test-cert
rules:
- host: test.com
http:
paths:
- path: /?(.*)
pathType: Prefix
backend:
service:
name: mirror
port:
number: 80
- path: /v1/?(.*)
pathType: Prefix
backend:
service:
name: test-api
port:
number: 80
This is on DOKS v1.24.4.
This caused major outages for us across the board, and was extremely insidious to debug.
There were no errors in the ingress controller pods. No errors in the ingress descriptions (k describe ingress). No issue modifying the ingress definitions (k apply -f). No issues in the compiled nginx configs in the ingress controller pods. Because it was so insidious to debug this issue, we had a long downtime and huge impact.
Reverting to 4.4.0 immediately solved the issues.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (12 by maintainers)
yes, this is actually a bigger problem we have on ingress, the feedback we provide for users is “poor” 😃
we had some plans to add this feedback on events (describe wont show anything as we don’t have status fields on Ingress).
The validation we could also do is on webhooks, but in your case the ingress object was already on apiserver, so it is “after” anything we could do.
It is still our plan to provide some better feedbacks to users on why some Ingress couldn’t be reconciled, add some pre validation before entering the sync loop.
Thanks for raising this bug 😄 We are rolling back and re-planning how to release this, and also we will have some minor release soon allowing regex to be part only of “ImplementationSpecific” types
Don’t know if it’s a good idea but maybe you should also remove the container image for v1.5.2? Automation systems like Renovate will still pick up this new version and suggest the update to the user or worse, automatically roll it out due to being a patch version.