prometheus: prometheus:v2.0.0-alpha.2 - OOM Killed when Limit is set ( When only Request is set, memory usage keep growing )
What did you do? I am running Prometheus 2.x on Kubernetes.
What did you expect to see?
Prometheus should operate within the allocated memory (like storage.local.target-heap-size
in 1.6+).
I had a discussion on https://github.com/coreos/prometheus-operator/issues/480, it was mentioned that Prometheus 2.x is using mmap and it memory will be evicted kernel automatically.
@vnandha I talked to @lucab and he mentioned that mmaped memory is one of the first things that get eviceted by the kernel, when hitting the requested amount of memory (requested as in the fields in Kubernetes).
What did you see instead? Under which circumstances?
I observed two issues
- When I set only
.spec.resources.requests.memory
, Prometheus keep using available memory on the system, below shows memory usage is 57G rss and 81G virtual
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18356 root 20 0 81.436g 0.057t 37268 S 115.1 46.3 26780:41 prometheus
But the requested memory is just 16G
Containers:
prometheus:
Container ID: docker://e00d9d307bf2fd396e73c914f28ef71c79b340623eff23336c512fa531126d79
Image: quay.io/prometheus/prometheus:v2.0.0-alpha.2
Image ID: docker-pullable://quay.io/prometheus/prometheus@sha256:bfaea6c2e210d739978ec001ccaa992ed476c4a50c65391d229c0a957bde574c
Port: 9090/TCP
Args:
-config.file=/etc/prometheus/config/prometheus.yaml
-storage.local.path=/var/prometheus/data
-storage.tsdb.no-lockfile
-storage.tsdb.retention=72h
-web.route-prefix=/
State: Running
Started: Sun, 16 Jul 2017 05:43:10 +0000
Ready: True
Restart Count: 0
Requests:
cpu: 8
memory: 16Gi
Liveness: http-get http://:web/status delay=300s timeout=3s period=5s #success=1 #failure=10
- When
.spec.resources.requests.limits
is set, its gets OOM killed
Containers:
prometheus:
Container ID: docker://2323744304b72d7b657f737937f400408cf41ba1658d101ee643d0ea44057648
Image: quay.io/prometheus/prometheus:v2.0.0-alpha.2
Image ID: docker-pullable://quay.io/prometheus/prometheus@sha256:bfaea6c2e210d739978ec001ccaa992ed476c4a50c65391d229c0a957bde574c
Port: 9090/TCP
Args:
-config.file=/etc/prometheus/config/prometheus.yaml
-storage.local.path=/var/prometheus/data
-storage.tsdb.no-lockfile
-storage.tsdb.retention=72h
-web.route-prefix=/
State: Running
Started: Sun, 30 Jul 2017 02:08:01 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Mon, 01 Jan 0001 00:00:00 +0000
Finished: Sun, 30 Jul 2017 02:07:40 +0000
Ready: False
Restart Count: 19
Limits:
cpu: 16
memory: 32Gi
Requests:
cpu: 16
memory: 32Gi
Liveness: http-get http://:web/status delay=300s timeout=3s period=5s #success=1 #failure=10
Environment Kubernetes 1.6.1
- System information:
Linux 3.10.0-514.16.1.el7.x86_64 x86_64
- Prometheus version:
Image: quay.io/prometheus/prometheus:v2.0.0-alpha.2
What is the recommened way to manage memory in 2.x Prometheus.?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 23 (15 by maintainers)
@vnandha any updates on this? I’m also seeing OOM killed constantly after running Prometheus for several hours. Have you found any solution?
This is memory
limit
sorry. Request is just6Gi
. I did not change the request from before, so it hit17Gi
of usage with the6Gi
request.Thanks, Goutham.
On Tue, Aug 1, 2017 at 6:47 PM Frederic Branczyk notifications@github.com wrote:
According to this doc, unless I’m reading it wrong, mmap’d pages do count as long as they are mapped in, so you do need to account for the active chunks when setting memory limits.