reth: BlockchainTree sidechain/pending block inmemory structure

In reth we are moving to have pending state and sidechains inside the memory. This decisions coupled with blockchains having the finality of N block seem reasonable but pull some additional changes that we need to implement inside the client.

The table for the latest/pending and sidechain blocks where the whole block is going to be saved, it is needed as a system of crash recovery, on startup Blockchain tree can be regenerated from it.

flowchart BT
    subgraph canonical chain
        CanonState:::state
        block0canon:::canon -->block1canon:::canon -->block2canon:::canon -->block3canon:::canon --> block4canon:::canon --> block5canon:::canon
    end
    block5canon --> block6pending1:::pending --> block7pending1:::pending
    block5canon --> block6pending2:::pending 
    subgraph sidechain2
        S2State:::state
        block3canon --> block4s2:::sidechain --> block5s2:::sidechain
    end
    subgraph sidechain1
        S1State:::state
        block2canon --> block3s1:::sidechain --> block4s1:::sidechain --> block5s1:::sidechain --> block6s1:::sidechain
    end
    classDef state fill:#1882C4
    classDef canon fill:#8AC926
    classDef pending fill:#FFCA3A
    classDef sidechain fill:#FF595E

Mermaid flowchart represents all blocks that can appear in blockchain. Green blocks belong to canonical chain and are saved inside database table, they are our main chain. Pending blocks and sidechains are found in memory inside [BlockchainTree]. Both pending and sidechains have same mechanisms only difference is when they got committed to database. For pending it is just append operation but for sidechains they need to move current canonical blocks to BlockchainTree and flush sidechain to the database to become canonical chain.

As all pending block are already executed and we have data, when it gets decided that block is canonical, changes can be flushed directly to database. We need to be aware of seting pipeline stages progress.

For unwinding canonical chain from database, data saved in database can be reused and stored in memory, pipeline unwind would just remove it. Still need to be aware of pipeline stages progress.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 31 (6 by maintainers)

Most upvoted comments

started doing revert of substate with changesets (needed for chain joints)

Is this in case: A -> B, but you need to reorg to A -> C, meaning you need to revert B and then re-apply C?

Yeah exactly like that, it is more impactful if you have G -> A -> B and you want state of G->A so you can revert B and add C to have G->A->C

Worked on applying and reverting of changesets in substate, and they seem okay. Next in line is splitting of the chain (as in if just half of chain gets commited for any reason), and saving last 256 block hashes in memory (last 256 hashes are needed for BLOCKHASH opcode).

After that, I am left with commit_canonical and revert_canonical functions that flush the data to tables and I can start writing tests.

Worked on substate and substate with provider structure that would allow us to plug in necessary provider while substate data is standalone.: https://github.com/paradigmxyz/reth/blob/rakita/blockchain_tree/crates/executor/src/substate.rs#L9 https://github.com/paradigmxyz/reth/blob/rakita/blockchain_tree/crates/executor/src/substate.rs#L40

Worked on block hashes as we need last 256 block hashes for execution: https://github.com/paradigmxyz/reth/blob/e04e743f1fd5dea925bf3da5f8b66ef00636b12b/crates/executor/src/blockchain_tree/mod.rs#L122

Next step is to use all of that and start executing block in Chain and generating changesets.

Did finalization and discarding of block/sidechain that are not valid. Will start next on block execution found in Chain, substate integrating and changesets.

Will use executor interface trait to allow better testing: https://github.com/paradigmxyz/reth/blob/4cbd1990163a005e0bca0b3e1f94ccd28dbf2e04/crates/interfaces/src/executor.rs#L7

Let’s go with all in-memory initially and assume smol reorgs, let’s keep our focus on getting things done “right” for Ethereum and expand from there. I don’t think it’ll be a big change to use a disk-backed backend.

Wondering how much memory would this occupy for chains which can have deep re-orgs (and so multiple sidechains?)… Or even with shallow ones. To me, having it all in memory seems wasteful? If in the future, we wish to have as much (canonical) plain state in memory , this seems to detract from that.

Another loose idea: could we just write/read this kind of data to redb? Would it be overly complex/slow? It could be a way to gain experience with it and evaluate pros/cons against libmdbx

Had the same concern that I wrote to gakons: I have good feeling about BlockchainTree, only concern there is i need to make it slim as to not take a lot of RAM.

Was thinking about reorgs for a while, and with the finality that every chain has, deep reorgs that PoW had is a thing of the past. We can expect only N number of reorgs: https://twitter.com/rakitadragan/status/1615666501440212993

Reasons and my thinking why is it this way: It is easier to do all in memory, atm Ethereum have small reorg depth, there is a small benefit of pending/best blocks being in memory for researchers (which can be achieved with good cache), adding db in the mix is always the option if we notice that in memory footprint is too big.

Using redb seems like a good idea.

Did some left TODOs and started writing tests/fixing bugs for insert/take functions in db Transaction.

Updated pipeline tables on commit/revert paths. It will iterate over SyncStatus table and update block number to last commits/reverted. Reverted state to the previous block. Tied those function in BlockchainTree.

Refactored Indexing/Hashing stages and moved functionality to Provider.

I think that is that on functionality and I can start testing. I will try to refactor execution in some way so it is easier to mock.

Will start reviewing this closer tomorrow. Thanks for all the updates.

Mostly worked on the function for getting and unwinding blocks from tables, Extracting execution results was a tricky one but I am close to finishing that. And I need to call Index, Hashing,Merkle stages unwind to finish unwinding from Blockchain tree.

And after it, it is mostly testing.

Insert of past canonical hashes are added and update mechanism. Some renaming. Was focused on sync switch defining the flow and covering all potential edge cases.

I want to add a function that would receive the last finalised block and update the tree and its chains. This is needed if pipeline gets switched and some unknown hashes gets updated. After this I would start revert_canonical function that would read changed from db and write it as a separate Tree chain.

Finished #1474 we now have one function to push all change atomically to tables.

Did split of chain, some missing functions for executing block on canonical blocks, fixes by using proper history/latest provider depending on chain canonical joint.

Switched priority today and focused on commit_canonical and needed refactoring of staged. https://github.com/paradigmxyz/reth/pull/1474 (have a few todo’s to go over.) With one function we will have one atomic commit of the whole block.

For revery_canonical I am intending to read necessary fields from the database and use Pipeline to revert needed things.

chain split and the last 256 block hashes tasks are still there.

Execution fit nicely with providers, started doing revert of substate with changesets (needed for chain joints). and handling of edgecases as multiple selfdestructs/creates of same account in one chain (decided to handle it with counter). Will continue working on that.