prometheus: prometheus:v2.0.0-alpha.2 - OOM Killed when Limit is set ( When only Request is set, memory usage keep growing )

What did you do? I am running Prometheus 2.x on Kubernetes.

What did you expect to see? Prometheus should operate within the allocated memory (like storage.local.target-heap-size in 1.6+). I had a discussion on https://github.com/coreos/prometheus-operator/issues/480, it was mentioned that Prometheus 2.x is using mmap and it memory will be evicted kernel automatically.

@vnandha I talked to @lucab and he mentioned that mmaped memory is one of the first things that get eviceted by the kernel, when hitting the requested amount of memory (requested as in the fields in Kubernetes).

What did you see instead? Under which circumstances?

I observed two issues

  1. When I set only .spec.resources.requests.memory, Prometheus keep using available memory on the system, below shows memory usage is 57G rss and 81G virtual
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                                                                                      
18356 root      20   0 81.436g 0.057t  37268 S 115.1 46.3  26780:41 prometheus

But the requested memory is just 16G

Containers:
  prometheus:
    Container ID:       docker://e00d9d307bf2fd396e73c914f28ef71c79b340623eff23336c512fa531126d79
    Image:              quay.io/prometheus/prometheus:v2.0.0-alpha.2
    Image ID:           docker-pullable://quay.io/prometheus/prometheus@sha256:bfaea6c2e210d739978ec001ccaa992ed476c4a50c65391d229c0a957bde574c
    Port:               9090/TCP
    Args:
      -config.file=/etc/prometheus/config/prometheus.yaml
      -storage.local.path=/var/prometheus/data
      -storage.tsdb.no-lockfile
      -storage.tsdb.retention=72h
      -web.route-prefix=/
    State:              Running
      Started:          Sun, 16 Jul 2017 05:43:10 +0000
    Ready:              True
    Restart Count:      0
    Requests:
      cpu:              8
      memory:           16Gi
    Liveness:           http-get http://:web/status delay=300s timeout=3s period=5s #success=1 #failure=10

  1. When .spec.resources.requests.limits is set, its gets OOM killed
Containers:
  prometheus:
    Container ID:       docker://2323744304b72d7b657f737937f400408cf41ba1658d101ee643d0ea44057648
    Image:              quay.io/prometheus/prometheus:v2.0.0-alpha.2
    Image ID:           docker-pullable://quay.io/prometheus/prometheus@sha256:bfaea6c2e210d739978ec001ccaa992ed476c4a50c65391d229c0a957bde574c
    Port:               9090/TCP
    Args:  
      -config.file=/etc/prometheus/config/prometheus.yaml
      -storage.local.path=/var/prometheus/data
      -storage.tsdb.no-lockfile
      -storage.tsdb.retention=72h
      -web.route-prefix=/
    State:              Running
      Started:          Sun, 30 Jul 2017 02:08:01 +0000
    Last State:         Terminated
      Reason:           OOMKilled
      Exit Code:        137
      Started:          Mon, 01 Jan 0001 00:00:00 +0000
      Finished:         Sun, 30 Jul 2017 02:07:40 +0000
    Ready:              False
    Restart Count:      19
    Limits:
      cpu:      16
      memory:   32Gi
    Requests:
      cpu:              16
      memory:           32Gi
    Liveness:           http-get http://:web/status delay=300s timeout=3s period=5s #success=1 #failure=10

Environment Kubernetes 1.6.1

  • System information:

Linux 3.10.0-514.16.1.el7.x86_64 x86_64

  • Prometheus version: Image: quay.io/prometheus/prometheus:v2.0.0-alpha.2

What is the recommened way to manage memory in 2.x Prometheus.?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 23 (15 by maintainers)

Most upvoted comments

@vnandha any updates on this? I’m also seeing OOM killed constantly after running Prometheus for several hours. Have you found any solution?

This is memory limit sorry. Request is just 6Gi. I did not change the request from before, so it hit 17Gi of usage with the 6Gi request.

Thanks, Goutham.

On Tue, Aug 1, 2017 at 6:47 PM Frederic Branczyk notifications@github.com wrote:

Just for my understanding, did you set memory request or memory limit to achieve this behavior?

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/prometheus/prometheus/issues/3005#issuecomment-319366913, or mute the thread https://github.com/notifications/unsubscribe-auth/AHA3HwjDVHzB1bQeYbySfPlaM6yq2WPGks5sTyVwgaJpZM4OnhOl .

According to this doc, unless I’m reading it wrong, mmap’d pages do count as long as they are mapped in, so you do need to account for the active chunks when setting memory limits.