solana: Vote hash mismatch on mainnet-beta/validator-us-east1-b

validator-us-east1-b when delinquent last night after slot 4187115. At slot 4187144, validator-us-east1-b produced a different bank hash than validator-us-west1-b and seemingly the rest of the cluster

validator-us-east1-b:

[2020-04-07T08:31:22.809585778Z INFO  solana_runtime::bank]  frozen: 4187144
  hash: FmRsY4WgARgNXf1dW9dXzGXWPvKNp4LoKZYovUSPWMKU 
  accounts_delta: Uu2n1bkKHdv812mYR7KEgfqptoiVJM9CKagE1QqnwnF
  signature_count: 190
  last_blockhash: jeKFjTNiWbkpU2u9KmaxGBM8rTnqGTCddnDTtxd4o9A

Larger validator-us-east1-b log snippet at https://pastebin.com/md2VWphi

validator-us-west1-b:

[2020-04-07T08:31:23.082385197Z INFO  solana_runtime::bank] bank frozen: 4187144
  hash: 5M6n9bmiqdrRjr8myHMa2LWJqLhHLEs3WyMGfFniCGpH 
  accounts_delta: CqxeRSyM5dPBKjySz2DEnGM93Rdw6uoTDPYPpJYMWC5S
  signature_count: 190
  last_blockhash: jeKFjTNiWbkpU2u9KmaxGBM8rTnqGTCddnDTtxd4o9A

accounts_delta is different.

Full log for both validators at https://drive.google.com/drive/folders/19448bVE3TFRe4rFciKMb6SB9nkOeHw0l?usp=sharing

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 31 (31 by maintainers)

Most upvoted comments

@ryoqun Ok, great! I think there must be an inconsistency in the fee-theft-prevention fix. Transactions that use a durable nonce, but fail with an instruction error are subject to replay fee-theft. The fix was to always advance the nonce value and store the account, even in the face of TX failure. Perhaps there’s a missed code path in the non-leader case?

The changes went in here: https://github.com/solana-labs/solana/pull/7684

I hacked up ledger-tool to use blockstore processsor up to the offending slot, then start a leader bank and push the transactions to it… not sure if it’s working yet:

https://github.com/sakridge/solana/tree/blockstore-append-vec-debug

@t-nelson ah, looks like that’s right: solana_runtime::system_instruction_processor] Transfer: insufficient lamports (380611960, need 380621960) for key 5GGSqfxPar44zoYFtF1oFps5ibGk8hEtGpAC4dEotxSY

@ryoqun sounds good! I have eyes here now, so will chime in if anything comes to mind. Feel free to ping me with any questions about nonce as they arise

ok I’ve got some nice clue for the recent bank hash, which might indicate a bit concern to post the public github issue.

Background: https://github.com/solana-labs/solana/issues/9357#issuecomment-610735751

I strongly suspect there is underterminisim in the nonse instructions.

(EDIT: lame me… there is identical nonce accounts, it seems, replaced unrelated account with correct 44pvpMRVAX9HZLJcCf4LqpeUxHNbDhjjQsjXZEiH35fJ)

89_614_985_000 (extra lamport move) = 89_613_537_320 + 1_447_680 = (63mThK54orGMrbFKE9Uab25QX1j3C89A3tctRX74PNgx's lamports + 44pvpMRVAX9HZLJcCf4LqpeUxHNbDhjjQsjXZEiH35fJ's lamports)
80 (extra data update in bytes) = 80 = (44pvpMRVAX9HZLJcCf4LqpeUxHNbDhjjQsjXZEiH35fJ's data_len)

The math is exactly consistent with the odd extras, so I bet I can reproduce the bad bank hash, but haven’t done yet.

So, the transaction 9GUZSu4soQA9tZ65wnwz8MrsBqGr9emwkgj1Q8hXisGMfCVR7FePuwhZ2BfLr1a7zXeG8NFNHyWKCeko4stTEzG did succeeded on the bad validator (at the time, was leader). But the rest rejected it. That’s because ledger-tool doesn’t indicate such fund moves. Also, note that AdvanceNonceAccount instruction, which might indicate some race condition in the nonce code path.

Slot 4187144

  Entry 57 - num_hashes: 115, hashes: 298eUST5q26E1SjAZigfypHQhVYf7t5ZgPSuFjqL6eDy, transactions: 1
    Transaction 0
      Recent Blockhash: CDqe5KCaSRR3KcTaUny4rGvbWz9ojZ6S5ju4JGp3FCQh
      Signature 0: 9GUZSu4soQA9tZ65wnwz8MrsBqGr9emwkgj1Q8hXisGMfCVR7FePuwhZ2BfLr1a7zXeG8NFNHyWKCeko4stTEzG
      Header: MessageHeader { num_required_signatures: 1, num_readonly_signed_accounts: 0, num_readonly_unsigned_accounts: 2 }
      Account 0: 5GGSqfxPar44zoYFtF1oFps5ibGk8hEtGpAC4dEotxSY
      Account 1: 44pvpMRVAX9HZLJcCf4LqpeUxHNbDhjjQsjXZEiH35fJ
      Account 2: 63mThK54orGMrbFKE9Uab25QX1j3C89A3tctRX74PNgx
      Account 3: SysvarRecentB1ockHashes11111111111111111111
      Account 4: 11111111111111111111111111111111
      Instruction 0
        Program: 11111111111111111111111111111111 (4)
        Account 0: 44pvpMRVAX9HZLJcCf4LqpeUxHNbDhjjQsjXZEiH35fJ (1)
        Account 1: SysvarRecentB1ockHashes11111111111111111111 (3)
        Account 2: 5GGSqfxPar44zoYFtF1oFps5ibGk8hEtGpAC4dEotxSY (0)
        AdvanceNonceAccount
      Instruction 1
        Program: 11111111111111111111111111111111 (4)
        Account 0: 5GGSqfxPar44zoYFtF1oFps5ibGk8hEtGpAC4dEotxSY (0)
        Account 1: 63mThK54orGMrbFKE9Uab25QX1j3C89A3tctRX74PNgx (2)
        Transfer { lamports: 380621960 }
      Status: Unavailable

 63mThK54orGMrbFKE9Uab25QX1j3C89A3tctRX74PNgx:
   - lamports: 89613537320
   - owner: '11111111111111111111111111111111'
   - executable: false
   - data: ''
   - data_len: 0

44pvpMRVAX9HZLJcCf4LqpeUxHNbDhjjQsjXZEiH35fJ :
   - lamports: 1447680
   - owner: '11111111111111111111111111111111'
   - executable: false
   - data: '11116bv5oKw5RX63tAZ3rYmq2RtHuqJQ9tGs4Ru2bbrbr2Zz5EFgjADyHFneZ7vBUJ1nAiYqvaK3Q1ZNds1BSbLwdhhQaBGSTkP3krFfBnw'
   - data_len: 80