influxdb: influxdb out of memory
### 1. Description: When running the Influx, the VIRT memory consumption rapidly increases ,and was eventually killed by OOM.
2. Environment:`
[Docker version]:Docker version 1.13.1, build 07f3374/1.13.1 [Docker run command]: docker run -it -d --network host -v /var/lib/influxdb/:/opt/host --memory 10g dfaa91697202 /bin/bash -c “sleep 10000000” [influxdb version]:InfluxDB shell version: 1.7.0~n201808230800
### [conf]: influxdb_conf.TXT
[influx logs]:
influx_log_1.zip Note: 8 hours difference in log time zone
[disk infos]:
I monitored the size of the data , as well as the memory changes, like the following
---------------------------top bein 54004----------------- top - 11:46:09 up 16 days, 3:09, 14 users, load average: 11.29, 10.57, 10.36 Tasks: 1 total, 0 running, 1 sleeping, 0 stopped, 0 zombie %Cpu(s): 16.1 us, 1.1 sy, 0.0 ni, 82.7 id, 0.0 wa, 0.0 hi, 0.1 si, 0.0 st KiB Mem : 26345912+total, 1652376 free, 54648416 used, 20715833+buff/cache KiB Swap: 4194300 total, 3644 free, 4190656 used. 20497331+avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 54004 root 20 0 57.4g 10.0g 154332 S 93.8 4.0 1251:44 influxd ---------------------------top end 54004-----------------
[messages log]:
Apr 11 11:48:01 psinsight-112 kernel: influxd invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0 Apr 11 11:48:01 psinsight-112 kernel: influxd cpuset=docker-f1207e65211b540e841d5d13a7b66c393ddde902d0c80a3bfa215a4c16754418.scope mems_allowed=0-1 Apr 11 11:48:01 psinsight-112 kernel: CPU: 21 PID: 54013 Comm: influxd Kdump: loaded Tainted: G ------------ T 3.10.0-957.1.3.el7.x86_64 #1 Apr 11 11:48:01 psinsight-112 kernel: Hardware name: Huawei 2288H V5/BC11SPSCB0, BIOS 0.68 05/03/2018 Apr 11 11:48:01 psinsight-112 kernel: Call Trace: Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e561e41>] dump_stack+0x19/0x1b Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e55c86a>] dump_header+0x90/0x229 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9dfba036>] ? find_lock_task_mm+0x56/0xc0 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e031388>] ? try_get_mem_cgroup_from_mm+0x28/0x60 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9dfba4e4>] oom_kill_process+0x254/0x3d0 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e100c9c>] ? selinux_capable+0x1c/0x40 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e035186>] mem_cgroup_oom_synchronize+0x546/0x570 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e034600>] ? mem_cgroup_charge_common+0xc0/0xc0 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9dfbad74>] pagefault_out_of_memory+0x14/0x90 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e55ad72>] mm_fault_error+0x6a/0x157 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e56f7a8>] __do_page_fault+0x3c8/0x500 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e56f915>] do_page_fault+0x35/0x90 Apr 11 11:48:01 psinsight-112 kernel: [<ffffffff9e56b758>] page_fault+0x28/0x30 Apr 11 11:48:01 psinsight-112 kernel: Task in /system.slice/docker-f1207e65211b540e841d5d13a7b66c393ddde902d0c80a3bfa215a4c16754418.scope killed as a result of limit of /system.slice/docker-f1207e65211b540e841d5d13a7b66c393ddde902d0c80a3bfa215a4c16754418.scope Apr 11 11:48:01 psinsight-112 kernel: memory: usage 10485760kB, limit 10485760kB, failcnt 159873656 Apr 11 11:48:01 psinsight-112 kernel: memory+swap: usage 10504544kB, limit 20971520kB, failcnt 0 Apr 11 11:48:01 psinsight-112 kernel: kmem: usage 0kB, limit 9007199254740988kB, failcnt 0 Apr 11 11:48:01 psinsight-112 kernel: Memory cgroup stats for /system.slice/docker-f1207e65211b540e841d5d13a7b66c393ddde902d0c80a3bfa215a4c16754418.scope: cache:168KB rss:10485592KB rss_huge:0KB mapped_file:4KB swap:18784KB inactive_anon:1623952KB active_anon:8861888KB inactive_file:112KB active_file:0KB unevictable:0KB Apr 11 11:48:01 psinsight-112 kernel: [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Apr 11 11:48:01 psinsight-112 kernel: [44993] 0 44993 1078 11 8 7 0 sleep Apr 11 11:48:01 psinsight-112 kernel: [50116] 0 50116 2943 56 11 40 0 bash Apr 11 11:48:01 psinsight-112 kernel: [54004] 0 54004 15101796 2622256 18846 4497 0 influxd Apr 11 11:48:01 psinsight-112 kernel: [54423] 0 54423 685493 1284 121 234 0 influx Apr 11 11:48:01 psinsight-112 kernel: Memory cgroup out of memory: Kill process 223248 (influxd) score 720 or sacrifice child Apr 11 11:48:01 psinsight-112 kernel: Killed process 54004 (influxd) total-vm:60407184kB, anon-rss:10479648kB, file-rss:9376kB, shmem-rss:0kB
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 6
- Comments: 35 (1 by maintainers)
I think it is necessary to have a mechanism to control memory usage to avoid OOM, even if it may cause performance degradation, after all, usability is more important than performance
My 512MB device with debian buster (Armbian) survived the night without OOM using tsi1 indexing on influxdb v1.8.0;-)
It now steadily uses 12.2% of CPU instead of the 38% it used earlier (resident memory went down from 180MB to 60MB), and I don’t feel the difference using queries with series-id-set-cache-size=0.
The default configuration for compaction throughput should be much more conservative especially on smaller devices, I’ve down-scaled compact-throughput=“48m” to “1m” and -burst=“48m” to “10m”. I think this reduces the steep increase in memory usage during compaction, which was triggered 6 times last night.
This is still an urgent problem, because of which we do not run the risk of updating above 1.7.3 to version 1.7.10/1.8, in which there is the necessary Flux language functionality.
1.7.10 same problem
Hi bapBardas,
Glad I could help. For my small device I also decreased the cache-max-memory-size from 1g to 64m and cache-snapshot-memory-size from 25m to 1m. This may not matter on your monster-machine.
Kind regards, Dennis
Hi What version does really work? I just want my home automation not to crash every 6h because it is out of memory. Which version you recommend? V1.6.4?
It worked good for a year then I had the bad idea to upgrade influxdb >1.7, since then things went out of control. 😦 I do not need any fancy features, just a stable version.
Thanks for a feedback (PS, yes, I tried to move from TSM to TSI, but hey, after 4h I aborted, it not even showed a progress bar. And does anyone know a good howto, to downgrade?
Same thing on 1.7.7.
No special config but a lot of data. I purge all my data and still same problem, the server work for 15 min and lost internet connection but it not reboot. (I don’t have access to it directly)
Just before server crash: