reth: BlockchainTree sidechain/pending block inmemory structure
In reth we are moving to have pending state and sidechains inside the memory. This decisions coupled with blockchains having the finality of N block seem reasonable but pull some additional changes that we need to implement inside the client.
The table for the latest/pending and sidechain blocks where the whole block is going to be saved, it is needed as a system of crash recovery, on startup Blockchain tree can be regenerated from it.
flowchart BT
subgraph canonical chain
CanonState:::state
block0canon:::canon -->block1canon:::canon -->block2canon:::canon -->block3canon:::canon --> block4canon:::canon --> block5canon:::canon
end
block5canon --> block6pending1:::pending --> block7pending1:::pending
block5canon --> block6pending2:::pending
subgraph sidechain2
S2State:::state
block3canon --> block4s2:::sidechain --> block5s2:::sidechain
end
subgraph sidechain1
S1State:::state
block2canon --> block3s1:::sidechain --> block4s1:::sidechain --> block5s1:::sidechain --> block6s1:::sidechain
end
classDef state fill:#1882C4
classDef canon fill:#8AC926
classDef pending fill:#FFCA3A
classDef sidechain fill:#FF595E
Mermaid flowchart represents all blocks that can appear in blockchain.
Green blocks belong to canonical chain and are saved inside database table, they are our main
chain. Pending blocks and sidechains are found in memory inside [BlockchainTree].
Both pending and sidechains have same mechanisms only difference is when they got committed to
database. For pending it is just append operation but for sidechains they need to move current
canonical blocks to BlockchainTree and flush sidechain to the database to become canonical chain.
As all pending block are already executed and we have data, when it gets decided that block is canonical, changes can be flushed directly to database. We need to be aware of seting pipeline stages progress.
For unwinding canonical chain from database, data saved in database can be reused and stored in memory, pipeline unwind would just remove it. Still need to be aware of pipeline stages progress.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 31 (6 by maintainers)
Yeah exactly like that, it is more impactful if you have
G -> A -> Band you want state ofG->Aso you can revertBand addCto haveG->A->CWorked on applying and reverting of changesets in substate, and they seem okay. Next in line is splitting of the chain (as in if just half of chain gets commited for any reason), and saving last 256 block hashes in memory (last 256 hashes are needed for BLOCKHASH opcode).
After that, I am left with
commit_canonicalandrevert_canonicalfunctions that flush the data totablesand I can start writing tests.Worked on substate and substate with provider structure that would allow us to plug in necessary provider while substate data is standalone.: https://github.com/paradigmxyz/reth/blob/rakita/blockchain_tree/crates/executor/src/substate.rs#L9 https://github.com/paradigmxyz/reth/blob/rakita/blockchain_tree/crates/executor/src/substate.rs#L40
Worked on block hashes as we need last 256 block hashes for execution: https://github.com/paradigmxyz/reth/blob/e04e743f1fd5dea925bf3da5f8b66ef00636b12b/crates/executor/src/blockchain_tree/mod.rs#L122
Next step is to use all of that and start executing block in
Chainand generating changesets.Did finalization and discarding of block/sidechain that are not valid. Will start next on block execution found in
Chain, substate integrating and changesets.Will use executor interface trait to allow better testing: https://github.com/paradigmxyz/reth/blob/4cbd1990163a005e0bca0b3e1f94ccd28dbf2e04/crates/interfaces/src/executor.rs#L7
Let’s go with all in-memory initially and assume smol reorgs, let’s keep our focus on getting things done “right” for Ethereum and expand from there. I don’t think it’ll be a big change to use a disk-backed backend.
Had the same concern that I wrote to gakons:
I have good feeling about BlockchainTree, only concern there is i need to make it slim as to not take a lot of RAM.Was thinking about reorgs for a while, and with the finality that every chain has, deep reorgs that PoW had is a thing of the past. We can expect only N number of reorgs: https://twitter.com/rakitadragan/status/1615666501440212993
Reasons and my thinking why is it this way: It is easier to do all in memory, atm Ethereum have small reorg depth, there is a small benefit of pending/best blocks being in memory for researchers (which can be achieved with good cache), adding db in the mix is always the option if we notice that in memory footprint is too big.
Using
redbseems like a good idea.Did some left TODOs and started writing tests/fixing bugs for insert/take functions in db
Transaction.Updated pipeline tables on commit/revert paths. It will iterate over
SyncStatustable and update block number to last commits/reverted. Reverted state to the previous block. Tied those function in BlockchainTree.Refactored
Indexing/Hashingstages and moved functionality to Provider.I think that is that on functionality and I can start testing. I will try to refactor execution in some way so it is easier to mock.
Will start reviewing this closer tomorrow. Thanks for all the updates.
Mostly worked on the function for getting and unwinding blocks from tables, Extracting execution results was a tricky one but I am close to finishing that. And I need to call
Index,Hashing,Merklestages unwind to finish unwinding from Blockchain tree.And after it, it is mostly testing.
Insert of past canonical hashes are added and update mechanism. Some renaming. Was focused on sync switch defining the flow and covering all potential edge cases.
I want to add a function that would receive the last finalised block and update the tree and its chains. This is needed if pipeline gets switched and some unknown hashes gets updated. After this I would start
revert_canonicalfunction that would read changed from db and write it as a separate Tree chain.Finished #1474 we now have one function to push all change atomically to tables.
Did split of chain, some missing functions for executing block on canonical blocks, fixes by using proper history/latest provider depending on chain canonical joint.
Switched priority today and focused on
commit_canonicaland needed refactoring of staged. https://github.com/paradigmxyz/reth/pull/1474 (have a few todo’s to go over.) With one function we will have one atomic commit of the whole block.For
revery_canonicalI am intending to read necessary fields from the database and usePipelineto revert needed things.chain split and the last 256 block hashes tasks are still there.
Execution fit nicely with providers, started doing revert of substate with changesets (needed for chain joints). and handling of edgecases as multiple selfdestructs/creates of same account in one chain (decided to handle it with counter). Will continue working on that.