tendermint: Replace Commit Timeout by splitting into a Quorum and Remainder Commit

Protocol Change Proposal

Summary

TimeoutCommit is a local consensus parameter which indicates the duration a node waits to receive additional precommit votes after having received the necessary 2/3 required to commit the block. The reason for this timeout is that having more precommits votes decreases the likelihood that a fork was produced in this block by the fact that more voting power would have been required to generate the fork (i.e. seeing 2/3+ votes means that 1/3+ could have double voted to generate a fork whilst 90% votes means that it would take at least 57% (67% - (100%-90%)) of voting power to double sign). I propose an alternative method which removes TimeoutCommit, speeding up consensus, whilst still allowing for greater confidence in network integrity. The cost of this proposal is greater technical complexity.

Proposal

Proposers of Tendermint currently append the precommit votes they saw in the previous height to the block they propose at the current height. This cannonicalizes the commit once the proposed block is agreed upon in consensus. Rather than having a single LastCommit, I propose separating these into two values: a QuorumCommit and a RemainderCommit.

The QuorumCommit constitutes the necessary 2/3+ precommits for height h - 1, whilst the RemainderCommit constitutes the remaining precommits for height h - 2. Thus validators have the duration of an entire height with which to still collect votes for the RemainderCommit. Note, that it’s also feasible to extend this to 3, 10 or even 100 heights in the past but this adds further complexity which I don’t think is necessary.

Persistence of Commits

Once a block gets committed, the node should add the RemainderCommit to the QuorumCommit and store this at the height of the block it is associated with to make it easier to retrieve the SignedHeader for light client verification (which is also used for block sync and state sync).

Verification

Nodes using sequential verification (i.e. for block sync) will receive the SignedHeader and only need to check the signatures of the quorum commit to verify the Header (and thus the rest of the Block). Nodes using skipping verification (i.e. light clients) will figure out the overlay and may require signatures from both commits to verify the SignedHeader.

Signature Aggregation

It is likely we will adopt some form of signature aggregation in the near future. In this case the signatures in the quorum commit will be aggregated and the signatures in the remainder commit will be aggregated. Perhaps there might need some more thought into what is the best structure that supports both consensus and light client verification.

Just a final clarification: this is just an idea, none of this has been formally verified in any capacity and their might be holes in my thinking. I welcome anyone to challenge or build on top of this 😃


For Admin Use

  • Not duplicate issue
  • Appropriate labels applied
  • Appropriate contributors tagged
  • Contributor assigned/self-assigned

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 16 (14 by maintainers)

Most upvoted comments

Yes, you cannot ensure decision in less than f rounds, because they can be coordinated by Byzantine processes. But once a round started after GST is coordinated by a correct processes, decision is guaranteed.

In Tendermint, the possibility of having hidden locks may prevent such from happening, but due to the PrecommitTimeout hidden locks should not remain “hidden” for more than one round after GST.

Regarding the proposal, I am not sure if and how you can rewrite a block, in order to include additional Precommits received after the first 2/3+.

You wouldn’t be rewriting a block. The additional precommits would be added in the next block (h + 2) by the next proposer. It just means that nodes need to keep votes around for two heights instead of one.

I think that TimeoutCommit is there in order to collect as many Precommits message as possible to include them in the next block. One reason for that is rewarding all the validators that contributed with the validation and commit of a block. This situation would be covered by @cmwaters proposal, as additional Precommits will still be gathered during the next height, and somehow also considered in the future for rewarding and block validation.

The timeout that is required for ensuring liveness is TimeoutPrecommit. It is triggered when a validator receives 2/3+ disagreeing Precommits messages. In this case, no block will be committed in that round, and waiting before entering a new round ensures that locked and valid values will be updated by correct validators. Locked and valid values are updated upon receiving Prevote messages, not Precommit messages.

In summary, it is common to confuse this two timeouts with similar names, but I think that only the TimeoutPrecommit is a requirement for liveness. TimeoutCommit does not even appears in the paper, and for me it was always something related to proof-of-stake and not to consensus itself. But checking this with @milosevic will be nice.