reth: OOM during sync

Describe the bug

There is likely a memory leak in the reth syncing pipeline. With 128GB RAM available, the process has failed due to OOM errors.

I did not get the commit hash which produced the OOM, but after pulling main 9e72cbf6b44fd195738c6d85f461304578c8156c, recompiling, and restarting the process has already ballooned to ~90GB.

Screen Shot 2023-06-22 at 12 45 53 PM

Steps to reproduce

$ git checkout 9e72cbf6b44fd195738c6d85f461304578c8156c

$ RUSTFLAGS="-C target-cpu=native" cargo build --profile maxperf

$ reth node

Node logs

Here is the latest log with a block referenced in case the sync point helps is useful.

Jun 22 17:47:38 reth[149860]: 2023-06-22T17:47:38.412442Z  INFO try_insert_validated_block{block=(17536667, 0xf73ffd11d424742f9d0b303cc461db8df69c38772e1c43a82feff5b25f0ae691)}:try_append_canonical_chain:try_insert_validated_block{block=(17536688, 0xa67e40d64f49c893bff382d1a4d910c6a9a8c8c0ace13ebea4782e6c05e3efa5)}: blockchain_tree: return=Ok(Valid)

Platform(s)

Linux (x86)

What version/commit are you on?

9e72cbf6b44fd195738c6d85f461304578c8156c

What database version are you on?

Current database version: 1 Local database version: 1

If you’ve built Reth from source, provide the full command you used

RUSTFLAGS=“-C target-cpu=native” cargo build --profile maxperf

Code of Conduct

  • I agree to follow the Code of Conduct

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19 (5 by maintainers)

Most upvoted comments

I see. I don’t have the OS depth to to understand the mechanism your mentioning, but in my case the OOM error was surfaced in the journalctl for the systemd service unit which runs reth.

Using your command, I was also able to get this log:

Screen Shot 2023-06-26 at 1 27 07 PM

Yep, not saying it wasn’t killed, just noting for ppl who follow along 😃 Res includes mmap’d pages though, so it is still not accurate.

See https://github.com/ledgerwatch/erigon#htop-shows-incorrect-memory-usage

Please note the memory in that column is not accurate. It is not actually how much memory it uses, since it also counts mmap’d files (which is just an extension of the address space).

The process was killed because of OOM though. I assume you are referencing the VIRTUAL column, but the RESIDENT column is still growing too high.

Okay, I’ve restarted it with peers.connection_info.max_outbound = 3. The commit = 54f4f39389f83437142e2dd236bed165c2bad837.

FYI, before restarting, the memory had grown to 56GB.

@mattsse Sure, it’s running. I’ll update you in a few hours.