reth: OOM during sync
Describe the bug
There is likely a memory leak in the reth syncing pipeline. With 128GB RAM available, the process has failed due to OOM errors.
I did not get the commit hash which produced the OOM, but after pulling main 9e72cbf6b44fd195738c6d85f461304578c8156c, recompiling, and restarting the process has already ballooned to ~90GB.
Steps to reproduce
$ git checkout 9e72cbf6b44fd195738c6d85f461304578c8156c
$ RUSTFLAGS="-C target-cpu=native" cargo build --profile maxperf
$ reth node
Node logs
Here is the latest log with a block referenced in case the sync point helps is useful.
Jun 22 17:47:38 reth[149860]: 2023-06-22T17:47:38.412442Z INFO try_insert_validated_block{block=(17536667, 0xf73ffd11d424742f9d0b303cc461db8df69c38772e1c43a82feff5b25f0ae691)}:try_append_canonical_chain:try_insert_validated_block{block=(17536688, 0xa67e40d64f49c893bff382d1a4d910c6a9a8c8c0ace13ebea4782e6c05e3efa5)}: blockchain_tree: return=Ok(Valid)
Platform(s)
Linux (x86)
What version/commit are you on?
9e72cbf6b44fd195738c6d85f461304578c8156c
What database version are you on?
Current database version: 1 Local database version: 1
If you’ve built Reth from source, provide the full command you used
RUSTFLAGS=“-C target-cpu=native” cargo build --profile maxperf
Code of Conduct
- I agree to follow the Code of Conduct
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19 (5 by maintainers)
I see. I don’t have the OS depth to to understand the mechanism your mentioning, but in my case the OOM error was surfaced in the
journalctlfor the systemd service unit which runsreth.Using your command, I was also able to get this log:
Yep, not saying it wasn’t killed, just noting for ppl who follow along 😃 Res includes mmap’d pages though, so it is still not accurate.
See https://github.com/ledgerwatch/erigon#htop-shows-incorrect-memory-usage
The process was killed because of OOM though. I assume you are referencing the VIRTUAL column, but the RESIDENT column is still growing too high.
Okay, I’ve restarted it with peers.connection_info.max_outbound = 3. The commit =
54f4f39389f83437142e2dd236bed165c2bad837.FYI, before restarting, the memory had grown to 56GB.
@mattsse Sure, it’s running. I’ll update you in a few hours.