prometheus: OOM :killed process(prometheus),is there memory leak?
Hi, I have a single prometheus server that scrape about 50+ targets. it’will be OOM running several hours. i’m confused that,
- dmesg
[3907506.014018] [50093] 0 50093 8556396 8490383 16703 0 0 prometheus
[3907506.014031] Out of memory: Kill process 50093 (prometheus) score 947 or sacrifice child
[3907506.014035] Killed process 50093 (prometheus) total-vm:34225584kB, anon-rss:33961532kB, file-rss:0kB
[3920674.254981] [63061] 0 63061 8506886 8492690 16614 0 0 prometheus
[3920674.260250] Out of memory: Kill process 63061 (prometheus) score 947 or sacrifice child
[3920674.262081] Killed process 63061 (prometheus) total-vm:34027544kB, anon-rss:33970760kB, file-rss:0kB
[3958788.455016] [105674] 0 105674 8547989 8492511 16685 0 0 prometheus
[3958788.460257] Out of memory: Kill process 105674 (prometheus) score 947 or sacrifice child
[3958788.462060] Killed process 105674 (prometheus) total-vm:34191956kB, anon-rss:33970044kB, file-rss:0kB
[3970678.851899] [117374] 0 117374 8505681 8494867 16616 0 0 prometheus
[3970678.855538] Out of memory: Kill process 117374 (prometheus) score 947 or sacrifice child
[3970678.857368] Killed process 117374 (prometheus) total-vm:34022724kB, anon-rss:33979468kB, file-rss:0kB
- system info
[15:23 root@prometheus-poc:/var/mwc/jobs] # cat /etc/redhat-release
CentOS Linux release 7.1.1503 (Core)
[15:23 root@prometheus-poc:/var/mwc/jobs] # uname -a
Linux prometheus-poc 3.10.0-229.el7.x86_64 #1 SMP Fri Mar 6 11:36:42 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
[15:23 root@prometheus-poc:/var/mwc/jobs] # free -bt
total used free shared buff/cache available
Mem: 35682078720 24212639744 372396032 94019584 11097042944 11069128704
Swap: 0 0 0
Total: 35682078720 24212639744 372396032
- prometheus version
prometheus, version 0.17.0 (branch: release-0.17, revision: e11fab3)
build user: fabianreinartz@macpro
build date: 20160302-17:48:43
go version: 1.5.3
- prometheus startup flags
prometheus -config.file=/var/mwc/jobs/prometheus/conf/prometheus.yml -storage.local.path=/mnt/prom_data -storage.local.memory-chunks=1048576 -log.level=debug -storage.remote.opentsdb-url=http://10.63.121.35:4242 -alertmanager.url=http://10.63.121.65:9093
- prometheus scrape config
scrape_configs:
- job_name: 'prometheus'
scrape_interval: 5s
scrape_timeout: 10s
target_groups:
- targets: ['localhost:9090']
- job_name: 'node'
scrape_interval: 5s
scrape_timeout: 10s
target_groups:
- targets: ['localhost:9100']
- job_name: 'overwritten-default'
scrape_interval: 5s
scrape_timeout: 10s
consul_sd_configs:
- server: <consul_server>
datacenter: “consul_dc”
relabel_configs:
- source_labels: ['__meta_consul_service_id']
regex: '(.*)'
target_label: 'job'
replacement: '$1'
action: 'replace'
- source_labels: ['__meta_consul_service_address','__meta_consul_service_port']
separator: ';'
regex: '(.*);(.*)'
target_label: '__address__'
replacement: '$1:$2'
action: 'replace'
- source_labels: ['__meta_consul_service_id']
regex: '^prometheus_.*'
action: 'keep'
- prometheus process status
Name: prometheus
State: S (sleeping)
Tgid: 130923
Ngid: 0
Pid: 130923
PPid: 1
TracerPid: 0
Uid: 0 0 0 0
Gid: 0 0 0 0
FDSize: 512
Groups:
VmPeak: 19548872 kB
VmSize: 19548872 kB
VmLck: 0 kB
VmPin: 0 kB
VmHWM: 19486532 kB
VmRSS: 19486532 kB
VmData: 19532964 kB
VmStk: 136 kB
VmExe: 6776 kB
VmLib: 0 kB
VmPTE: 38184 kB
VmSwap: 0 kB
Threads: 19
SigQ: 2/136048
SigPnd: 0000000000000000
ShdPnd: 0000000000000000
SigBlk: 0000000000000000
SigIgn: 0000000000000000
SigCgt: fffffffe7fc1feff
CapInh: 0000000000000000
CapPrm: 0000001fffffffff
CapEff: 0000001fffffffff
CapBnd: 0000001fffffffff
Seccomp: 0
Cpus_allowed: ff
Cpus_allowed_list: 0-7
Mems_allowed: 00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000000,00000001
Mems_allowed_list: 0
voluntary_ctxt_switches: 1165275
nonvoluntary_ctxt_switches: 234755
- graph of process_resident_memory_bytes
- graph of prometheus_local_storage_memory_chunks
thanks.
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 21 (11 by maintainers)
That Prometheus should only be using ~3GB of RAM, but it looks like it’ll top out at ~70GB.
Do you happen to have over 20M timeseries? If so you need a bigger box and to increase -storage.local.memory-chunks.