go: testing: apparent memory leak in fuzzer

If I leave a fuzzer running for sufficiently long, then it crashes with an OOM. Snippet from dmesg:

[1974087.246830] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/user.slice/user-319973.slice/session-c1.scope,task=json.test,pid=3659815,uid=319973
[1974087.264733] Out of memory: Killed process 3659815 (json.test) total-vm:18973836kB, anon-rss:13185376kB, file-rss:0kB, shmem-rss:0kB, UID:319973 pgtables:33988kB oom_score_adj:0
[1974087.971181] oom_reaper: reaped process 3659815 (json.test), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

I don’t believe this is because the code being tested is OOMing, but rather the fuzzer itself is retaining too much memory.

Here’s a graph of the RSS memory usage over time: graph The machine has 32GiB of RAM.

There are large jumps in memory usage at various intervals. I don’t have much understanding of how the fuzzer works, but maybe this could be the mutator discovering that some input expands coverage and adding that to some internal data structure?

I should further note that when the fuzzer crashes, it produces a testdata/FuzzXXX/YYY file as the “reproducer”. Running the test with that “reproducer” does not fail the fuzz test. If possible, the fuzzer should be able to distinguish between OOMs due to itself and versus OOMs due to the code being tested. The former should not result in any “repro” corpus files being added, while the latter should.

I’m using 5aacd47c002c39b481c4c7a0663e851758da372a. (I can provide the code I’m fuzzing, but contact me privately)

\cc @katiehockman @jayconrod

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 2
Comments: 16 (12 by maintainers)

Commits related to this issue

testing: reduce memory used by subtest names This is heavily based on CL 341336 by Joe Tsai and CL 351452 by Jay Conrod. T.Run and T.Name use a map[string]int64 to hold the next suffix to use when d... — committed to golang/go by bcmills 3 years ago

Most upvoted comments

@dsnet, can you still reproduce the problem?

As of go version devel go1.18-3bbc82371e, this is still an issue. I run it with:

go.tip test -fuzz=OOM -v -parallel=1

Since each individual fuzz worker has its own namespace, running with -parallel=1 seems to manifest the problem fastest. On my Ryzen 5900x, it was allocating about ~1GiB/minute.

dsnet on Oct 4, 2021