solana: Hash mismatch on v1.5 in accounts simulation
Problem
Bootstrap leader seems to have encountered an erroneous hash, causing it to fork off during the accounts migration tests
[2021-01-29T01:06:13.269096499Z INFO solana_runtime::bank]
bank frozen: 181896 hash: 312MfFWZ9hpLWjaxbxZAAXiKdBE6dMwcHi3rghaPDN3F accounts_delta: GNBLwxAitac48gyceVwsdUUYNQYJV4AFaCKba7g7Rwjt signature_count: 10 last_blockhash: HFXB6GLYKH32nAhXeHUdVuqx5eCgFNYDJNx7wZiyAnb9 capitalization: 5500954163744417169
[2021-01-29T01:06:13.192369211Z INFO solana_runtime::bank]
bank frozen: 181896 hash: Cq9w8kKAQsMe9YCwz75j5m2UzjEA5HEbKknQgNXLnbmq accounts_delta: A6Q4wAxkYcxo6eK3fe9q6FUfgSJ9nUwijVf2W7Fv1zRJ signature_count: 10 last_blockhash: HFXB6GLYKH32nAhXeHUdVuqx5eCgFNYDJNx7wZiyAnb9 capitalization: 5500954163744417169
Proposed Solution
Debug and fix.
The first obstacle seems to be that the snapshots generated during the test run into 'Load from snapshot failed: Serialize(Io(Custom { kind: Other, error: "incorrect layout/length/data" }))' when trying to boot from them. This may somehow be related to the update_accounts_hash() being commented out in AccountsBackgroundService. Trying to see if I can salvage these to avoid replaying from genesis.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 24 (24 by maintainers)
https://github.com/solana-labs/rbpf/pull/140
@Mrmaxmeier The last inconsistency you found should be fixed with this: https://github.com/solana-labs/rbpf/tree/fix/lddw_related_bugs
I also updated the CLI tool so that you can run the verifier on the executables.
@Mrmaxmeier You’ve seen this issue on non-jit builds?
Looks like a difference with JIT:
This is the status with bpf_jit enabled:
cc @Lichtso @jackcmay
Ok logs are in the
~/issue-14948in dv/dw/da now. Also~/issue-14948ondvcontains the full ledger (which is consistent across the machines) and you can run:to observe how
dvusing v1.5 calculates the correct bank hash with ledger-toolSo in both of the available snapshots, the corrupted AppendVec is for slot
174160, id:331582AppendVec { path: "/home/carl_solana_com/DebugConsensus/accounts/174160.331582", map: MmapMut { ptr: 0x7f0cd4ed2000, len: 286720 }, append_offset: Mutex { data: 285149 }, current_len: 285149, file_size: 286720, remove_on_drop: true }The first ~70 accounts in the AppendVec are all readable, and then it becomes a bunch of the above garbage…
Given the mismatch popped up almost
7000slots later in slot181896, I’m guessing something must have corrupted the AppendVec, and then we read from the corrupted AppendVec. Maybe shrink or recycle. Would be interesting to see if all the accounts in that storage entry for that slot were actual accounts touched by transactions in that slot.This actually only seems to be happening on the node that forked off…interesting. The check that is failing is the
self.account_meta.lamports != 0 || self.clone_account() == Account::default()inappend_vec::sanitize_lamports(). The “account” that’s failing this check looks like:StoredAccountMeta { meta: StoredMeta { write_version: 0, pubkey: 11111111111111111111111111111111, data_len: 0 }, account_meta: AccountMeta { lamports: 0, owner: 11111111111111111111111111111111, executable: false, rent_epoch: 4294967296 }, data: [], offset: 31192, stored_size: 136, hash: 1111111111111111VJBRrGXMNSx9EXYe92GTF5 }I believe the above is a corrupted storage entry, as these appear multiple times, just usually they are all default so do not trigger the check to fail, looking like this:
System account: StoredAccountMeta { meta: StoredMeta { write_version: 0, pubkey: 11111111111111111111111111111111, data_len: 0 }, account_meta: AccountMeta { lamports: 0, owner: 11111111111111111111111111111111, executable: false, rent_epoch: 0 }, data: [], offset: 30240, stored_size: 136, hash: 11111111111111111111111111111111 }Note these are not the system account, which looks like:
StoredAccountMeta { meta: StoredMeta { write_version: 509, pubkey: 11111111111111111111111111111111, data_len: 14 }, account_meta: AccountMeta { lamports: 1, owner: NativeLoader1111111111111111111111111111111, executable: true, rent_epoch: 0 }, data: [115, 121, 115, 116, 101, 109, 95, 112, 114, 111, 103, 114, 97, 109], offset: 396536, stored_size: 152, hash: APNoNXDhUuQ8CLG8xRaQUUQxof1jsA8qLrhaQAM3NyJp }This leads me to believe that some of the AppendVec’s are not being stored properly/corrupted, maybe I’ve introduced a bug in
append_accounts()…