ingress-nginx: Metrics does not count requests on ingresses without host
Is this a request for help? Yes , bug with possible solution.
What keywords did you search in NGINX Ingress controller issues before filing this one? missing metrics (#3053 looked promising, but wrong version. according to git commit this bug was introduced in 0.20.0)
Is this a BUG REPORT or FEATURE REQUEST? (choose one): BUG REPORT
NGINX Ingress controller version: 0.22.0 (bug introduced in 0.20.0)
Kubernetes version (use kubectl version
): 1.7.14
Environment:
- Cloud provider or hardware configuration: bare metal
- OS (e.g. from /etc/os-release): coreos 1967.4.0
- Kernel (e.g.
uname -a
): 4.14.96-coreos - Install tools:
- Others:
What happened: ingresses without specific host does not report metrics.
What you expected to happen: All ingresses should have metrics.
How to reproduce it (as minimally and precisely as possible): Create an ingress without host set.
Anything else we need to know:
Hi we just upgraded to 0.22.0 from 0.15.0 and now have the issue with dissapearing metrics.
I looked a the code and found this commit which introduces the problem: https://github.com/kubernetes/ingress-nginx/commit/9766ad8f4be7432354b30e6be6ade730751d1207
If you have an ingress that does not specify host it will never find a match and not increase the metric counter.
So all ingresses without host will be without metrics now š¦
Iām not 100% familiar with the codebase and not quite sure how to solve it since we cannot know if ingress is missing host at that time. Perhaps hosts field on SocketCollector could be a
map[ingressName]struct{
Wildcard bool
Hosts sets.String
}
In that case we can know when not to skip because its a wildcard?
msg in handleMessage in SocketCollector looks like on a request on an ingress that does not have host. It looks korrect.
[{
"requestLength": 1190,
"ingress": "kafka-http-ingress",
"status": "201",
"service": "kafka-http",
"requestTime": 0.024,
"namespace": "default",
"host": "route-utv.fnox.se",
"method": "POST",
"upstreamResponseTime": 0.024,
"upstreamResponseLength": 4,
"upstreamLatency": 0.014,
"path": "\/internalapi\/kafka",
"responseLength": 213
}]
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 22
- Comments: 29 (6 by maintainers)
Hi @jonaz Any news with this issue? Itās very important for our use-caseā¦
@jonaz @aledbf any progress on this? Weāre essentially flying blind with our ingress right now because of this bugā¦
Are there any concrete plans on the timeline for this? We also have the issue, that we basically are blind in terms of nginx metrics, as we give every customer a custom subdomain (so we use a wildcard domain and hence have not metrics). IMHO it would be a good solution, to just record the wildcard host. For example:
We have an ingress for host
*.domain.com
. If a customer accesses our system viacustomer1.domain.com
, then the nginx could just record this request for*.domain.com
. The same for every otherxxx.domain.com
. All requests handled by the one wildcard ingress go into one bucket. Hence there would be no problem with unbounded cardinality in the metrics and so no DDOS problem. But we could still distinguish requests for the different ingresses.I would try to help improve this, but I have no experience with Go š¦
Just wanted to thank @jonaz for opening this issue, as this was the reason Grafana wasnāt showing anything for me.
It might be worth mentioning this in the monitoring section of the docs. Iād imagine others might assume a fan-out without a host would generate metrics.
I also cannot see metrics of hosts with wildcard, there is a workaround?
I am also experiencing the same issue not getting nginx_ingress_controller_requests metrics if I donāt define the host e.g.
This doesnāt generate metrics for nginx_ingress_controller_requests
This generate metrics for nginx_ingress_controller_requests
Would be great if we could get the fix for this.
My solution above would protect against DDOS and also support metrics on all paths+hosts. But no word from maintainers here yet.
Would it make sense to change the code here to export all ingress metrics when per-host metrics are disabled though? Weāre losing a lot of metrics that are essential to monitoring this in production because the metrics are just not passed on.
Hereās the flag Iām talking about: https://github.com/kubernetes/ingress-nginx/pull/3594
Hereās the code in question Iām suggesting we change: https://github.com/kubernetes/ingress-nginx/blob/ddedd165b2a457607e70e37d3d7ce613d1aa5307/internal/ingress/metric/collectors/socket.go#L224-L227
I even lose my metrics when I made my host a wildcard. So the host exists it just changed from e.g. āprod.something.comā to ā*.something.comā.
Very unlikely to ever happen. Iād recommend you name all of your ingresses (this is the approach I took to get around this issue). It isnāt that hard, especially when using some form of automation to deploy Kubernetes apps (Helm or in my case Terraform). Sticky Session also starts working again when doing this š
Long term, traefik might be the way to go, depending on if or not ingress-nginx picks up a new maintainer with aledbf stepping down.
Disclaimer: this isnāt a dig, ingress-nginx is a great piece of work and I highly appreciate the work aledbf has done on it. Just trying to point out a workaround and the fact things are bleak right now but hopefully someone/some others will step up.
@aledbf I also just ran into this issue (today as a matter-of-fact). IMO the linked PR (https://github.com/kubernetes/ingress-nginx/pull/4139) is a reasonable way to go. In the case that someone has a wildcard ingress IMO it makes sense to just exclude the
host
label ā since then I can still get metrics based on the ingress name (which is also a fixed cardinality).Is there anything remaining to get the PR merged in?
@hairyhenderson since no reply from @aledbf regarding my implementation suggestion we are migrating to traefik 2.0 instead.