ingress-gce: ingress-gce-404-server-with-metrics causes OOM
We encountered a scenario where 404-server-with-metrics can cause OOM exception. This is probably caused by logs being partially retained in the memory. When someone pings the cluster a lot (e.g. botnet looking for vulnerabilities) this causes a surge in the amount of log messages being written. Example:
...
I0505 11:27:49.607462 1 server-with-metrics.go:243] response 404 (backend NotFound), service rules for [ /header.html ] non-existent
I0505 11:27:49.707176 1 server-with-metrics.go:243] response 404 (backend NotFound), service rules for [ /q79w_38jg__.shtml ] non-existent
I0505 11:27:49.707220 1 server-with-metrics.go:243] response 404 (backend NotFound), service rules for [ /gk/public_html/ ] non-existent
...
Which in turn may cause the container to hit the memory limit.
/cc @mborsz
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 18 (8 by maintainers)
In fact it’s not logs being kept in memory. Those looks good.
I have done a following experiment:
it looks like vast majority of memory is being allocated in lines https://github.com/kubernetes/ingress-gce/blob/b1a745203c5465c6a59056acc2233da37b36402e/cmd/404-server-with-metrics/server-with-metrics.go#L99-L107
It looks like on each server.idleChannel update (which happens every request) we allocate a new time.Timer which then lives for the next *idleLogTimer (1h by default)
This matches the documentation of time.After (src: https://golang.org/pkg/time/#After):