cosmos-sdk: Consensus fails when using statesync mode to synchronize the application state
Summary of Bug
Consensus fails when using statesync mode to synchronize the application state and then execute the ibc-transfer transaction.
Description
When the cosmos-sdk-based chain is started, the capability/keeper/keeper.go#L177:InitializeCapability(…) method will be called to initialize the memStore from the application store. However, if the node is started using statesync mode, the application store will not be loaded until the node is switched to fastsync mode. But in this case, the method InitializeCapability will not be called again to initialize memStore. Therefore, when calling the method capability/keeper/keeper.go#L344:GetCapability(…), the node started using statesync mode cannot get the same result as other node.
Steps to Reproduce
The GetCapability(…) mothod used in IBC module, so it can be reproduced through ibc-transfer:
-
Start two testnets via gaia and create relayer for them, then create clients and channels. Refer: https://github.com/cosmos/relayer#demo
-
Create node1, node2 to join testnet
ibc-0:gaiad init node1 --home node1 cp data/ibc-0/config/genesis.json node1/config/genesis.json gaiad init node2 --home node2 cp data/ibc-0/config/genesis.json node2/config/genesis.jsonthen update state-sync config in node1/config/app.toml and node2/config/app.toml:
[state-sync] snapshot-interval = 100 snapshot-keep-recent = 4Start node1 and node2:
# NOTE: modify ports and add ibc-0 peer gaiad start --home node1 # NOTE: modify ports and add ibc-0 peer gaiad start --home node2 -
Create node3 to join testnet
ibc-0.gaiad init node3 --home node3 cp data/ibc-0/config/genesis.json node3/config/genesis.jsonUpdate config:
# config.toml [statesync] enable = true rpc_servers = "ibc-0 node rpc" trust_height = 1 trust_hash = "block 1 hash" trust_period = "168h0m0s" -
Send ibc-transfer
rly tx transfer ibc-0 ibc-1 1000000samoleans $(rly chains address ibc-1) rly tx relay-packets demo -d -
Start node3 using
statesyncmode# NOTE: modify ports and add ibc-0 peer gaiad start --home node3Get consensus failure error on executing the ibc-transfer transaction:
NOTE: if the latest block height is greater than the ibc-transfer tranaction exexuted height, no error is returned, you can unsafe-reset-all node3 and repeat steps 4-5.
4:49PM INF committed state app_hash=0475A43BE9A8BD240551895B01A31C5B1ABACD710DC273D819B863C9F355804C height=34721 module=state num_txs=1 4:49PM INF indexed block height=34721 module=txindex panic: Failed to process committed block (34722:51062BEB78119D7A5CF971B7FEC787C428E8347FE20C308BF98311C2F95BFA1B): wrong Block.Header.AppHash. Expected 0475A43BE9A8BD240551895B01A31C5B1ABACD710DC273D819B863C9F355804C, got 2443E8D78F4B2025252055EDD384DBF80839893092110C0A5D072DCABED9FB17 goroutine 135 [running]: github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).poolRoutine(0xc000548a80, 0xc0032dac01) github.com/tendermint/tendermint@v0.34.9/blockchain/v0/reactor.go:401 +0x15bf created by github.com/tendermint/tendermint/blockchain/v0.(*BlockchainReactor).SwitchToFastSync github.com/tendermint/tendermint@v0.34.9/blockchain/v0/reactor.go:125 +0xd8
For Admin Use
- Not duplicate issue
- Appropriate labels applied
- Appropriate contributors tagged
- Contributor assigned/self-assigned
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 28 (22 by maintainers)
@AdityaSripal the node has been running without issue for 3 days already. No crash no restart.
I have a non-breaking fix up that will be able to fix the issue for the 0.42 line here: https://github.com/cosmos/cosmos-sdk/tree/aditya/cap-init
Here’s the diff: https://github.com/cosmos/cosmos-sdk/compare/v0.42.5...aditya/cap-init?expand=1
Unfortunately the fix I proposed above can only be done efficiently if we move the reverse mapping into the persistent store. The reverse mapping is deterministic so there’s no issue moving it, it’s just a breaking change. Once that is done, reconstructing the forward mapping and capmap on-the-fly is trivial. This fix should go into 0.43
I will write tests for this tomorrow, but in the meantime it would be great if someone is able to test it out and see if statesync works.
@chengwenxi you can connect to the following nodes. They both have snapshots.
ae26f01b2bc504532a1cc15ce9da0b85ee5a98e7@139.177.178.149:26656ee27245d88c632a556cf72cc7f3587380c09b469@45.79.249.253:26656And if you need RPCs https://rpc.cosmoshub.forbole.com/ https://rpc.cosmoshub.bigdipper.live/
This is the issue right? Does NewApp get called for state sync? Or I guess any usage of capabilities during state sync is a problem?
In IBC capabilities are created at various times, sometimes during InitChain for binding ports, always during a channel handshake, and randomly by applications as they decide to bind to new port names.