solana: v1.16 nodes general protection fault
Several nodes have gotten GPF while running v1.16 on mainnet-beta. Debugging discussion is on Discord in the debug-gpf-1_16 channel. The faults are caused by one of the solBstoreProcXY threads in this threadpool; this pool is used solely for transaction replay. Here is a table tracking
| Date (UTC) | Host | Commit | OS/Kernel | Thread | Link |
|---|---|---|---|---|---|
| 2023-07-03 18:26:35 | mcb7 | 02d5647b | |||
| 2023-08-22 17:45:03 | mce2 | 9d83bb2a | Ubuntu 20.04.3 LTS / 5.4.0-81-generic | solBstoreProc02 | |
| 2023-08-28 23:05:45 | community | v1.16.9-jito | solBstoreProc02 | Discord depool | |
| 2023-08-29 00:03:40 | community | v1.16.9-jito | solBstoreProc09 | Discord 7Layer Overclock | |
| 2023-08-29 14:26:04 | community | v1.16.9-jito | solBstoreProc00 | Discord Timoon_21 | |
| 2023-08-29 ~17:00 | community | v1.16.9-jito | solBstoreProc19 | Discord depool | |
| 2023-09-11 ~07:00 | community | v1.16.13-jito | solBstoreProc01 | Discord meyerbro | |
| 2023-09-04 repeat | community | ??? | Ubuntu 22.04.2 LTS / 5.15.0-79-generic | solBstoreProcXY | Discord Ben |
mcb7 on 2023-07-03
https://discord.com/channels/428295358100013066/1027231858565586985/1125509913746083961
mce2 on 2023-08-22
mce2 was running 9d83bb2a when solana-validator received SEGV at 2023-08-22 17:45:12 UTC
/var/log/kern.log:
Aug 22 17:45:03 localhost kernel: [51336395.493217] traps: solBstoreProc02[666122] general protection fault ip:555881b65d94 sp:7f7df77f4b98 error:0 in solana-validator[55587f8f7000+22e5000]
/var/log/apport.log:
ERROR: apport (pid 783652) Tue Aug 22 17:45:03 2023: called for pid 664500, signal 11, core limit 0, dump mode 1
ERROR: apport (pid 783652) Tue Aug 22 17:45:03 2023: executable: /home/sol/.local/share/solana/install/releases/edge-9d83bb2a897d289a52996c0e3a384188064ed4d1/bin/solana-validator (command line "solana-validator --dynamic-port-range 8002-8020 --gossip-port 8001 --identity /home/sol/identity/mce2-identity.json --ledger /home/sol/ledger --limit-ledger-size --log /home/sol/logs/solana-validator.log --rpc-port 8899 --expected-genesis-hash 5eykt4UsFv8P8NJdTREpY1vzqKqZKvdpKuc147dw2N9d --wal-recovery-mode skip_any_corrupted_record --no-voting --trusted-validator 7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2 --trusted-validator GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ --trusted-validator DE1bawNcRJB9rVm3buyMVfr8mBEoyyu73NBovf2oXJsJ --trusted-validator CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S --halt-on-trusted-validators-accounts-hash-mismatch --no-untrusted-rpc --expected-shred-version 56177 --entrypoint mainnet-beta.solana.com:8001 --entrypoint entrypoint.mainnet-beta.solana.com:8001 --entrypoint entrypoint2.mainnet-beta.solana.com:8001 --entrypoint entrypoint3.mainnet-beta.solana.com:8001 --entrypoint entrypoint4.mainnet-beta.solana.com:8001 --entrypoint entrypoint5.mainnet-beta.solana.com:8001 --no-genesis-fetch --no-snapshot-fetch --entrypoint entrypoint.mainnet-beta.solana.com:8001 --entrypoint entrypoint2.mainnet-beta.solana.com:8001 --entrypoint entrypoint3.mainnet-beta.solana.com:8001 --entrypoint entrypoint4.mainnet-beta.solana.com:8001 --entrypoint entrypoint5.mainnet-beta.solana.com:8001")
ERROR: apport (pid 783652) Tue Aug 22 17:45:03 2023: executable does not belong to a package, ignoring
addr2line -e /home/sol/.local/share/solana/install/active_release/bin/solana-validator --functions --demangle 0x22e5000
core::ptr::drop_in_place<[alloc::sync::Arc<tokio::util::slab::Page<tokio::runtime::io::scheduled_io::ScheduledIo>>; 19]>
tokio.948cf0d3-cgu.9:?
2023-09-11 meyebro
kernel: traps: solBstoreProc01[2476276] general protection fault ip:5636c60cf361 sp:7fa015231058 error:0 in solana-validator[5636c3e58000+232c000]
$ addr2line -e ... --functions --demangle 0x232c000
<tokio::net::tcp::split::ReadHalf as tokio::io::async_read::AsyncRead>::poll_read
??:?
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 15 (9 by maintainers)
I’ve had pretty good success reproducing this. I compiled a binary w debug symbols and expect to be able to repro in the next day or so. Once I have a core dump w debug symbols it’s GG.
fixed in 1.16.14