solana: Vote hash mismatch on mainnet-beta/validator-us-east1-b
validator-us-east1-b when delinquent last night after slot 4187115. At slot 4187144, validator-us-east1-b produced a different bank hash than validator-us-west1-b and seemingly the rest of the cluster
validator-us-east1-b:
[2020-04-07T08:31:22.809585778Z INFO solana_runtime::bank] frozen: 4187144
hash: FmRsY4WgARgNXf1dW9dXzGXWPvKNp4LoKZYovUSPWMKU
accounts_delta: Uu2n1bkKHdv812mYR7KEgfqptoiVJM9CKagE1QqnwnF
signature_count: 190
last_blockhash: jeKFjTNiWbkpU2u9KmaxGBM8rTnqGTCddnDTtxd4o9A
Larger validator-us-east1-b log snippet at https://pastebin.com/md2VWphi
validator-us-west1-b:
[2020-04-07T08:31:23.082385197Z INFO solana_runtime::bank] bank frozen: 4187144
hash: 5M6n9bmiqdrRjr8myHMa2LWJqLhHLEs3WyMGfFniCGpH
accounts_delta: CqxeRSyM5dPBKjySz2DEnGM93Rdw6uoTDPYPpJYMWC5S
signature_count: 190
last_blockhash: jeKFjTNiWbkpU2u9KmaxGBM8rTnqGTCddnDTtxd4o9A
accounts_delta is different.
Full log for both validators at https://drive.google.com/drive/folders/19448bVE3TFRe4rFciKMb6SB9nkOeHw0l?usp=sharing
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 31 (31 by maintainers)
@ryoqun Ok, great! I think there must be an inconsistency in the fee-theft-prevention fix. Transactions that use a durable nonce, but fail with an instruction error are subject to replay fee-theft. The fix was to always advance the nonce value and store the account, even in the face of TX failure. Perhaps there’s a missed code path in the non-leader case?
The changes went in here: https://github.com/solana-labs/solana/pull/7684
I hacked up ledger-tool to use blockstore processsor up to the offending slot, then start a leader bank and push the transactions to it… not sure if it’s working yet:
https://github.com/sakridge/solana/tree/blockstore-append-vec-debug
@t-nelson ah, looks like that’s right:
solana_runtime::system_instruction_processor] Transfer: insufficient lamports (380611960, need 380621960)for key5GGSqfxPar44zoYFtF1oFps5ibGk8hEtGpAC4dEotxSY@ryoqun sounds good! I have eyes here now, so will chime in if anything comes to mind. Feel free to ping me with any questions about nonce as they arise
ok I’ve got some nice clue for the recent bank hash, which might indicate a bit concern to post the public github issue.
Background: https://github.com/solana-labs/solana/issues/9357#issuecomment-610735751
I strongly suspect there is underterminisim in the nonse instructions.
(EDIT: lame me… there is identical nonce accounts, it seems, replaced unrelated account with correct
44pvpMRVAX9HZLJcCf4LqpeUxHNbDhjjQsjXZEiH35fJ)The math is exactly consistent with the odd extras, so I bet I can reproduce the bad bank hash, but haven’t done yet.
So, the transaction 9GUZSu4soQA9tZ65wnwz8MrsBqGr9emwkgj1Q8hXisGMfCVR7FePuwhZ2BfLr1a7zXeG8NFNHyWKCeko4stTEzG did succeeded on the bad validator (at the time, was leader). But the rest rejected it. That’s because ledger-tool doesn’t indicate such fund moves. Also, note that
AdvanceNonceAccountinstruction, which might indicate some race condition in the nonce code path.