VictoriaMetrics: vmselect is killed by oom-killer when quering export API

Describe the bug vmselect is killed by oom-killer when quering export API.

To Reproduce Setup cluster. Start vmselect container with 8GB memory limit. Perform data export

curl -H 'Accept-Encoding: gzip' http://<vmselect>:8481/select/1/prometheus/api/v1/export -d 'match[]={host="srv1"}' --output data.jsonl.gz
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 1092M    0 1092M    0    24  10.3M      0 --:--:--  0:01:45 --:--:-- 15.0M
curl: (18) transfer closed with outstanding read data remaining

Kernel messages:

conmon: conmon c023a2e02cd2f4109ddb <ninfo>: OOM received
kernel: [28891450.687558] vmselect-prod invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), nodemask=(null), order=0, oom_score_adj=0
kernel: [28891450.687562] vmselect-prod cpuset=libpod-c023a2e02cd2f4109ddbfe88a708f29ffa1f571af3e095a979010a6752953b71 mems_allowed=0
kernel: [28891450.687570] CPU: 40 PID: 2375 Comm: vmselect-prod Tainted: G           O      4.19.65-1.el7.x86_64 #1
kernel: [28891450.687571] Hardware name:
kernel: [28891450.687572] Call Trace:
kernel: [28891450.687583]  dump_stack+0x63/0x88
kernel: [28891450.687589]  dump_header+0x78/0x2a4
kernel: [28891450.687596]  ? mem_cgroup_scan_tasks+0x9c/0xf0
kernel: [28891450.687600]  oom_kill_process+0x26b/0x290
kernel: [28891450.687603]  out_of_memory+0x140/0x4a0
kernel: [28891450.687607]  mem_cgroup_out_of_memory+0xb9/0xd0
kernel: [28891450.687610]  try_charge+0x6d6/0x750
kernel: [28891450.687614]  ? __alloc_pages_nodemask+0x119/0x2a0
kernel: [28891450.687617]  mem_cgroup_try_charge+0xbe/0x1d0
kernel: [28891450.687619]  mem_cgroup_try_charge_delay+0x22/0x50
kernel: [28891450.687624]  do_anonymous_page+0x11a/0x650
kernel: [28891450.687627]  __handle_mm_fault+0xc24/0xe80
kernel: [28891450.687631]  handle_mm_fault+0x102/0x240
kernel: [28891450.687636]  __do_page_fault+0x212/0x4e0
kernel: [28891450.687640]  do_page_fault+0x37/0x140
kernel: [28891450.687645]  ? page_fault+0x8/0x30
kernel: [28891450.687648]  page_fault+0x1e/0x30
kernel: [28891450.687651] RIP: 0033:0x469e28
kernel: [28891450.687655] Code: 4c 01 de 48 29 c3 c5 fe 6f 06 c5 fe 6f 4e 20 c5 fe 6f 56 40 c5 fe 6f 5e 60 48 01 c6 c5 fd 7f 07 c5 fd 7f 4f 20 c5 fd 7f 57 40 <c5> fd 7f 5f 60 48 01 c7 48 29 c3 77 cf 48 01 c3 48 01 fb c4 c1 7e
kernel: [28891450.687656] RSP: 002b:000000c000c58c98 EFLAGS: 00010202
kernel: [28891450.687659] RAX: 0000000000000080 RBX: 0000000000093650 RCX: 000000c1e6d7c670
kernel: [28891450.687660] RDX: 000000000002d990 RSI: 000000c1e6ce9020 RDI: 000000c1eb062fa0
kernel: [28891450.687662] RBP: 000000c000c58cf8 R08: 0000000000000001 R09: 0000000000128000
kernel: [28891450.687663] R10: 000000c1eb00c000 R11: 0000000000000020 R12: 0000000000000002
kernel: [28891450.687665] R13: 0000000000df5660 R14: 0000000000000000 R15: 0000000000468840
kernel: [28891450.687667] Task in /cl/vmselect/pids-batch/libpod-c023a2e02cd2f4109ddbfe88a708f29ffa1f571af3e095a979010a6752953b71 killed as a result of limit of /cl/vmselect
kernel: [28891450.687678] memory: usage 8388608kB, limit 8388608kB, failcnt 712433
kernel: [28891450.687680] memory+swap: usage 8388612kB, limit 9007199254740988kB, failcnt 0
kernel: [28891450.687681] kmem: usage 63936kB, limit 8388608kB, failcnt 0
kernel: [28891450.687682] Memory cgroup stats for /cl/vmselect: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
kernel: [28891450.687699] Memory cgroup stats for /cl/vmselect/pids-batch: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
kernel: [28891450.687714] Memory cgroup stats for /cl/vmselect/pids-batch/libpod-c023a2e02cd2f4109ddbfe88a708f29ffa1f571af3e095a979010a6752953b71: cache:2460KB rss:8321004KB rss_huge:0KB shmem:0KB mapped_file:3300KB dirty:0KB writeback:0KB swap:0KB inactive_anon:20KB active_anon:8324280KB inactive_file:4KB active_file:0KB unevictable:0KB
kernel: [28891450.687731] Memory cgroup stats for /cl/vmselect/pids-idle: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB swap:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:0KB
kernel: [28891450.687747] Tasks state (memory values in pages):
kernel: [28891450.687748] [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
kernel: [28891450.687910] [   8923]     0  8923    10744     1210   131072        0             0 xxxxxxx
kernel: [28891450.687914] [   8952]     0  8952     7725      723   102400        0             0 xxxxxxx
kernel: [28891450.687917] [   9018]     0  9018    63581      787   151552        0             0 xxxxxxx
kernel: [28891450.687962] [ 167136]   999 167136  4331147  2079099 18558976        0             0 vmselect-prod
kernel: [28891450.687974] [ 176997]     0 176997     3987     1063    81920        0             0 xxxxxxx
kernel: [28891450.688025] Memory cgroup out of memory: Kill process 167136 (vmselect-prod) score 993 or sacrifice child
kernel: [28891450.701657] Killed process 167136 (vmselect-prod) total-vm:17324588kB, anon-rss:8311840kB, file-rss:6052kB, shmem-rss:0kB
kernel: [28891451.138861] oom_reaper: reaped process 167136 (vmselect-prod), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

Version

vmselect-20200727-210903-tags-v1.39.1-cluster-0-g96bc476e5

Used command-line flags

flag{name="cacheDataPath", value="/var/lib/victoriametrics/cache"} 1
flag{name="dedup.minScrapeInterval", value="1ms"} 1
flag{name="enableTCP6", value="false"} 1
flag{name="envflag.enable", value="true"} 1
flag{name="envflag.prefix", value=""} 1
flag{name="fs.disableMmap", value="false"} 1
flag{name="http.disableResponseCompression", value="false"} 1
flag{name="http.maxGracefulShutdownDuration", value="7s"} 1
flag{name="http.pathPrefix", value=""} 1
flag{name="http.shutdownDelay", value="0s"} 1
flag{name="httpListenAddr", value=":8481"} 1
flag{name="loggerErrorsPerSecondLimit", value="10"} 1
flag{name="loggerFormat", value="default"} 1
flag{name="loggerLevel", value="INFO"} 1
flag{name="loggerOutput", value="stderr"} 1
flag{name="memory.allowedPercent", value="60"} 1
flag{name="search.cacheTimestampOffset", value="5m0s"} 1
flag{name="search.denyPartialResponse", value="false"} 1
flag{name="search.disableCache", value="false"} 1
flag{name="search.latencyOffset", value="30s"} 1
flag{name="search.logSlowQueryDuration", value="5s"} 1
flag{name="search.maxConcurrentRequests", value="16"} 1
flag{name="search.maxExportDuration", value="720h0m0s"} 1
flag{name="search.maxLookback", value="0s"} 1
flag{name="search.maxPointsPerTimeseries", value="30000"} 1
flag{name="search.maxQueryDuration", value="30s"} 1
flag{name="search.maxQueryLen", value="16384"} 1
flag{name="search.maxQueueDuration", value="10s"} 1
flag{name="search.maxStalenessInterval", value="0s"} 1
flag{name="search.minStalenessInterval", value="0s"} 1
flag{name="search.resetCacheAuthKey", value="secret"} 1
flag{name="selectNode", value=""} 1
flag{name="storageNode", value="1-vmstorage:8401,2-vmstorage:8401,3-vmstorage:8401"} 1
flag{name="version", value="false"} 1

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 21

Commits related to this issue

app/vmselect: reduce memory usage when exporting time series with big number of samples via `/api/v1/export` if `max_rows_per_line` is set to non-zero value Updates https://github.com/VictoriaMetrics... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
app/vmselect: reduce memory usage when exporting time series with big number of samples via `/api/v1/export` if `max_rows_per_line` is set to non-zero value Updates https://github.com/VictoriaMetrics... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
app: respect CPU limits set via cgroups Update GOMAXPROCS to limits set via cgroups. This should reduce CPU trashing and reduce memory usage for cases when VictoriaMetrics components run in container... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
app: respect CPU limits set via cgroups Update GOMAXPROCS to limits set via cgroups. This should reduce CPU trashing and reduce memory usage for cases when VictoriaMetrics components run in container... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
lib/cgroup: attempt to obtain available CPU cores via /sys/devices/system/cpu/online See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/685#issuecomment-674423728 — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
lib/cgroup: attempt to obtain available CPU cores via /sys/devices/system/cpu/online See https://github.com/VictoriaMetrics/VictoriaMetrics/issues/685#issuecomment-674423728 — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
lib/cgroup: do not adjust the number of detected CPU cores via /sys/devices/system/cpu/online The adjustement increases the resulting GOMAXPROC by 1, which looks confusing to users as outlined at htt... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago
lib/cgroup: do not adjust the number of detected CPU cores via /sys/devices/system/cpu/online The adjustement increases the resulting GOMAXPROC by 1, which looks confusing to users as outlined at htt... — committed to VictoriaMetrics/VictoriaMetrics by valyala 4 years ago

Most upvoted comments

@YuriGrigorov , could you verify whether cluster components of VictoriaMetrics properly determine the number of available CPU cores starting from the commit 89d652b ?

With CPU quota limit

# cat /sys/devices/system/cpu/online
0-2

VictoriaMetrics/lib/cgroup/cpu.go:37    updating GOMAXPROCS to 4 according to cgroup CPU quota

Without CPU quota limit

# cat /sys/devices/system/cpu/online
0-79

VictoriaMetrics/lib/cgroup/cpu.go:31    cgroup CPU quota=81 exceeds NumCPU=80; using GOMAXPROCS=NumCPU

Looks little bit misleading.

YuriGrigorov on Sep 24, 2020

As i said in my case quota is set in last but one level of cgroup hierarchy, so it is not visible from inside of container. Value of /sys/fs/cgroup/cpu/cpu.cfs_quota_us is -1 and cpu.cfs_period_us is 100000.

It would be great if somebody would suggest reliable method for determining CPU quota in this case.

You may try get available CPUs by reading /sys/devices/system/cpu/online.

online – CPUs that are online and being scheduled

https://www.kernel.org/doc/html/latest/admin-guide/cputopology.html

The number is rounded to whole.

I checked some our production containers

Actual cores	Value from /sys/devices/system/cpu/online
1.567432	0
2.1052632	0-1
4.649123	0-3
7.157895	0-6

YuriGrigorov on Aug 15, 2020

vmselect-prod -version
vmselect-20200812-092759-heads-cluster-0-g6721e47a

CPU quota is not respected (as memory limit too) due to the specifics of our cloud.

As i said in my case quota is set in last but one level of cgroup hierarchy, so it is not visible from inside of container. Value of /sys/fs/cgroup/cpu/cpu.cfs_quota_us is -1 and cpu.cfs_period_us is 100000.

So the only way to reduce CPU trashing in my case is set GOMAXPROCS env var equal to containers CPU quota.

But using cpu.cfs_quota_us and cpu.cfs_period_us must work fine in clean Docker and Kubernetes.

YuriGrigorov on Aug 12, 2020