snarkOS: [Bug] Updating validator committee every two rounds will cause leader manipulation and permanent chain state split
https://hackerone.com/reports/2289066
Summary:
AleoBFT updates the committee (stake, status, etc) with every block. At each even round, elect the leader from the committee with latest committed block. This will introduce two issues:
** Issue a. Leader Manipulation **:
The leader is selected from the committee by the seed which is composed of self.starting_round, current_round, total_stake
. However, the current leader can manipulate total_stake
by adding bond_public
transactions in its batch. The leader can easily find the target total_stake
which make itself the leader for the next round. Thus, a leader can keep itself as a leader forever.
** Issue b. Permanent Chain State Split **
In current AleoBFT, the leader election directly depends on the latest committed block. However, for asynchronous network, some validators may have committed a block but some others not, making validators elect different leaders. This will cause permanent chain state split.
I will provide Steps To Reproduce for Issue a and Proof-of-Concept for Issue b.
Steps To Reproduce:
For Issue a.
- Clone https://github.com/Gooong/snarkOS/tree/manipulate_consensus
- Run ./devnet.sh with default setting
- Wait for 100 rounds
- Check
validator-0.log
and find that the validator 0 has been leader for most even rounds. (Just count the line by searchingelected a leader - aleo1rhgdu77hgyqd3xjj8ucu3jj9r2krwz6mnzyd80gncr5fxcwlh5rsvzp9px
and compare that by count the lineelected a leader
)
Also see the example validator-0.log
at the attachment.
Note that, in our code, Validator 0 only send the transaction with best effort. In practice, the the leader can directly simulate and add the transaction to the batch. This will 100% make itself the leader of future rounds and other validator can not do anything to prevent this.
For issue b.
Take the picture in the bullshark blog as example.
Suppose we are in Round 3, while deciding the leader for current round, Validator 1 will find that Round 1 hasn’t been committed so it will use committee in the round before Round 1. However, in Validator 4’s view, Round 1 has been committed so it just decide the leader from Round 1’s committee. This will cause permanent chain state split.
Additional considerations
The ambiguous of latest committee will also introduce other tricky problems:
If last block haven’t been committed, we can’t tell if the current round BatchCertificate
(either from other peers or from myself) has reached quorum. Because the committe will change once the last block is committed. This function in LedgerService
makes the committee ambiguous.
/// Returns the committee for the given round.
/// If the given round is in the future, then the current committee is returned.
fn get_committee_for_round(&self, round: u64) -> Result<Committee<N>> {
match self.ledger.get_committee_for_round(round)? {
// Return the committee if it exists.
Some(committee) => Ok(committee),
// Return the current committee if the round is in the future.
None => {
// Retrieve the current committee.
let current_committee = self.current_committee()?;
// Return the current committee if the round is in the future.
match current_committee.starting_round() <= round {
true => Ok(current_committee),
false => bail!("No committee found for round {round} in the ledger"),
}
}
}
}
I think we can fix this by:
- In
current_round
, use the committe fromcurrent_round - COMMITTEE_LAG_ROUND
round. (COMMITTEE_LAG_ROUND could be large enough like 100 or 1000) The function will be like:
/// Returns the committee for the given round.
/// If the given round is in the future, then the current committee is returned.
fn get_committee_for_round(&self, round: u64) -> Result<Committee<N>> {
match self.ledger.get_committee_for_round(round.saturating_sub(COMMITTEE_LAG_ROUND) )? {
// Return the committee if it exists.
Some(committee) => Ok(committee),
// Return the current committee if the round is in the future.
None => {
bail!("No committee found for round {round} in the ledger"),
}
}
}
In this way, the concensue will stall if the latest COMMITTEE_LAG_ROUND
round hasn’t been committed.
- Or keep the committee the same for the whole Epoch. But we need to be cautious at the Epoch boundary.
Supporting Material/References:
See the attachment
Fix Suggestions:
The problem here is that AleoBFT’s committee directly depends on the latest committed state. However, the latest state
is easy to manipulate and can temporarily mismatch.
It’s suggested to fix the committee for some fixed rounds. Also, while choosing leader, only use random seed that is hard to manipulate.
In SUI, the validator committee is fixed for the whole epoh (~24h). In the codebase of bullshark paper, it directly use round-robin: https://github.com/asonnino/narwhal/blob/667595d9dc82a472bade5afd59993d8479047a48/consensus/src/lib.rs#L205
Impact
Current AleoBFT implementation will cause: (a) leader manipulation and (b) permanent chain state split
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 21 (21 by maintainers)
@raychu86 I think this is sufficient to address the issue.
In theory, the malicious validators can quit and rejoin the committee to manipulate x-coordinate of the address. However, due to the
unbonding
period, it is impractical to do that frequently.I’ll share a message with the bug bounty team at the Foundation regarding the severity level. Just a heads up that the bug bounty process is unaffiliated to Provable (where Ray and I are).
@randomsleep Would sorting the member based on the x-coordinate of the validator address be sufficient to replace the
sorted_members
? We currently do not have an easy way to track the order that validators joined/left the committee.We have a proposed change here - https://github.com/AleoHQ/snarkVM/pull/2378
@randomsleep If you look at the code for
get_committee_for_round
in snarkVM, you’ll see that even if there is no committee/block being created at that round, there is still a committee selected, which is the committee of the previous committed block.If round 10 has a block and round 16 has a block. The committee at round 11-15 will be the committee created at round 10.
I believe this should be sufficient for addressing the issue. Please let me know if you still have concerns.
@randomsleep Great job! I found the leader is deterministic as well, but I thought it is safe at that time.
In the current implementation, the leader is deterministic, based on the previous committee:
I have created another example to demonstrate perfect leader manipulation: https://github.com/randomsleep/snarkOS/commit/2e8cbe4b7e9d009be2be4d6c270c7a4afac385c8
In the example, Validator 0 is malicious. Validator 1, 2, and 3 are honest and they randomly send
bond_public
transactions. While proposing batch, Validator 0 will speculate all the transactions and then insertbond_public
transactions with specific amount. This makes Validator 0 always leader in future rounds.Step To Reporduce:
git clone https://github.com/randomsleep/snarkOS.git
git checkout perfect_leader_manipulation
./devnet
with the default setting.elected a leader - aleo1rhgdu77hgyqd3xjj8ucu3jj9r2krwz6mnzyd80gncr5fxcwlh5rsvzp9px
)Here is the log sample: logs-20240129114403.zip