earlyoom: earlyoom crash: Could not convert number: Numerical result out of range

I just noticed earlyoom did this, next time I’ll try to provide more debug info:

Out of memory! avail: 138 MiB < min: 786 MiB
Killing process 4084 kworker/u17:3
Out of memory! avail: 155 MiB < min: 786 MiB
Killing process 4084 kworker/u17:3
Out of memory! avail: 157 MiB < min: 786 MiB
Killing process 4084 kworker/u17:3
Could not convert number: Numerical result out of range

But I think earlyoom should never kill kernel threads 😉 and not quit on that conversion error.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 25 (18 by maintainers)

Commits related to this issue

Most upvoted comments

I am afraid the cosmic ray hit us too… 😕

From our logs:

$ grep earlyoo /var/log/syslog
Jan  7 14:45:27 nora earlyoom[73850]: Out of memory! avail: 6271 MiB < min: 6436 MiB
Jan  7 14:45:28 nora earlyoom[73850]: Killing process 106631 kworker/27:0-events
Jan  7 14:46:43 nora earlyoom[73850]: Out of memory! avail: 6215 MiB < min: 6436 MiB
Jan  7 14:46:44 nora earlyoom[73850]: Killing process 106997 kworker/27:0-events
Jan  7 14:51:25 nora earlyoom[73850]: Out of memory! avail: 6420 MiB < min: 6436 MiB
Jan  7 14:51:26 nora earlyoom[73850]: Killing process 109345 kworker/27:0-events
Jan  7 14:54:11 nora earlyoom[73850]: Out of memory! avail: 6315 MiB < min: 6436 MiB
Jan  7 14:54:11 nora earlyoom[73850]: Killing process 110952 kworker/27:0-events
Jan  7 14:54:53 nora earlyoom[73850]: Out of memory! avail: 6249 MiB < min: 6436 MiB
Jan  7 14:54:53 nora earlyoom[73850]: Killing process 109737 kworker/27:0-events
Jan  7 14:58:23 nora earlyoom[73850]: Out of memory! avail: 6423 MiB < min: 6436 MiB
Jan  7 14:58:24 nora earlyoom[73850]: Killing process 111780 kworker/27:0-events
Jan  7 15:12:11 nora earlyoom[73850]: Out of memory! avail: 6429 MiB < min: 6436 MiB
Jan  7 15:12:11 nora earlyoom[73850]: Killing process 118805 kworker/27:0-events
Jan  7 15:12:34 nora earlyoom[73850]: Out of memory! avail: 6188 MiB < min: 6436 MiB
Jan  7 15:12:34 nora earlyoom[73850]: Killing process 119431 kworker/27:0-events
Jan  7 15:13:54 nora earlyoom[73850]: Out of memory! avail: 6216 MiB < min: 6436 MiB
Jan  7 15:13:54 nora earlyoom[73850]: Killing process 119537 kworker/27:0-events
Jan  7 15:18:37 nora earlyoom[73850]: Out of memory! avail: 6409 MiB < min: 6436 MiB
Jan  7 15:18:37 nora earlyoom[73850]: Killing process 121864 kworker/27:0-events
Jan  7 15:53:33 nora earlyoom[73850]: Out of memory! avail: 6430 MiB < min: 6436 MiB
Jan  7 15:53:34 nora earlyoom[73850]: Killing process 9366 kworker/u256:4-btrfs-worker
Jan  7 16:02:31 nora earlyoom[73850]: Out of memory! avail: 6342 MiB < min: 6436 MiB
Jan  7 16:02:31 nora earlyoom[73850]: Killing process 13170 kworker/u256:4-flush-btrfs-1
Jan  7 16:36:53 nora earlyoom[73850]: Out of memory! avail: 6316 MiB < min: 6436 MiB
Jan  7 16:36:53 nora earlyoom[73850]: Killing process 38679 kworker/u256:0-events_power_efficient

I was just able to reproduce it with debug output:

$ sudo earlyoom -d
[...]
pid 43032: badness   0 vm_rss   3824 perl
pid 43037: badness 459 vm_rss 7552869 python
    ^ new victim (higher badness)
pid 44252: badness   0 vm_rss      0 kworker/u256:3-btrfs-worker
pid 45268: badness   0 vm_rss      0 kworker/8:2
[...]
pid 116702: badness 139 vm_rss 2291826 rsession
pid 118773: badness   0 vm_rss      0 kworker/u256:7-btrfs-worker
pid 119361: badness   0 vm_rss      0 kworker/23:0-events
[...]
pid 125310: badness   0 vm_rss    942 sshd
pid 125311: badness   0 vm_rss   1130 bash
pid 126500: badness   0 vm_rss      0 kworker/9:1-events
pid 129209: badness   0 vm_rss      0 kworker/27:0-events
pid 130435: badness   0 vm_rss      0 kworker/u256:0-btrfs-worker
Killing process 43037 kworker/u256:0-btrfs-worker
mem avail: 16607 MiB (25 %), swap free: 0 MiB (0 %)
mem avail: 24333 MiB (37 %), swap free: 0 MiB (0 %)
mem avail: 31706 MiB (49 %), swap free: 0 MiB (0 %)
mem avail: 35796 MiB (55 %), swap free: 0 MiB (0 %)
mem avail: 35954 MiB (55 %), swap free: 0 MiB (0 %)
mem avail: 35913 MiB (55 %), swap free: 0 MiB (0 %)
mem avail: 36048 MiB (56 %), swap free: 0 MiB (0 %)
mem avail: 36040 MiB (55 %), swap free: 0 MiB (0 %)
mem avail: 36072 MiB (56 %), swap free: 0 MiB (0 %)

Debug output suggests that pid 43037/python should be the victim but pid 43037/kworker/u256:0-btrfs-worker was killed (same pid!). I think what we did was actually starting a write process from this Python process, so it does not surprise me too much that some btrfs thingy is triggered, but I don’t understand why it has the same pid.

Meanwhile I’m not sure anylonger if we are seeing the same issue or a different one. I don’t see the error message Could not convert number: Numerical result out of range and we don’t have errors like fopen 11495/oom_score failed: No such file or directory either.

Using v1.0 until now, will try to reproduce using latest version and provide more information, but not sure if the problem is easily reproducible.

Anyway, earlyoom now explicitely skips kernel threads ( https://github.com/rfjakob/earlyoom/commit/58b66a392755b43d256dcaa1c39e548832b0307d ), and provides a bit more info when things go wrong (https://github.com/rfjakob/earlyoom/commit/0f422f5c3db87dae840d1d36a34edafe9a862128) and also enabled the gcc stack protector ( https://github.com/rfjakob/earlyoom/commit/2a9e3b9b66dac0bfe68b5da7b7eac646b20f7324 ) in case we are seeing memory corruption. Maybe you can recompile and see if you get anything like that again?