solana: v1.16 nodes general protection fault

Several nodes have gotten GPF while running v1.16 on mainnet-beta. Debugging discussion is on Discord in the debug-gpf-1_16 channel. The faults are caused by one of the solBstoreProcXY threads in this threadpool; this pool is used solely for transaction replay. Here is a table tracking

Date (UTC) Host Commit OS/Kernel Thread Link
2023-07-03 18:26:35 mcb7 02d5647b
2023-08-22 17:45:03 mce2 9d83bb2a Ubuntu 20.04.3 LTS / 5.4.0-81-generic solBstoreProc02
2023-08-28 23:05:45 community v1.16.9-jito solBstoreProc02 Discord depool
2023-08-29 00:03:40 community v1.16.9-jito solBstoreProc09 Discord 7Layer Overclock
2023-08-29 14:26:04 community v1.16.9-jito solBstoreProc00 Discord Timoon_21
2023-08-29 ~17:00 community v1.16.9-jito solBstoreProc19 Discord depool
2023-09-11 ~07:00 community v1.16.13-jito solBstoreProc01 Discord meyerbro
2023-09-04 repeat community ??? Ubuntu 22.04.2 LTS / 5.15.0-79-generic solBstoreProcXY Discord Ben

mcb7 on 2023-07-03

https://discord.com/channels/428295358100013066/1027231858565586985/1125509913746083961

mce2 on 2023-08-22

mce2 was running 9d83bb2a when solana-validator received SEGV at 2023-08-22 17:45:12 UTC

/var/log/kern.log:

Aug 22 17:45:03 localhost kernel: [51336395.493217] traps: solBstoreProc02[666122] general protection fault ip:555881b65d94 sp:7f7df77f4b98 error:0 in solana-validator[55587f8f7000+22e5000]

/var/log/apport.log:

ERROR: apport (pid 783652) Tue Aug 22 17:45:03 2023: called for pid 664500, signal 11, core limit 0, dump mode 1
ERROR: apport (pid 783652) Tue Aug 22 17:45:03 2023: executable: /home/sol/.local/share/solana/install/releases/edge-9d83bb2a897d289a52996c0e3a384188064ed4d1/bin/solana-validator (command line "solana-validator --dynamic-port-range 8002-8020 --gossip-port 8001 --identity /home/sol/identity/mce2-identity.json --ledger /home/sol/ledger --limit-ledger-size --log /home/sol/logs/solana-validator.log --rpc-port 8899 --expected-genesis-hash 5eykt4UsFv8P8NJdTREpY1vzqKqZKvdpKuc147dw2N9d --wal-recovery-mode skip_any_corrupted_record --no-voting --trusted-validator 7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2 --trusted-validator GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ --trusted-validator DE1bawNcRJB9rVm3buyMVfr8mBEoyyu73NBovf2oXJsJ --trusted-validator CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S --halt-on-trusted-validators-accounts-hash-mismatch --no-untrusted-rpc --expected-shred-version 56177 --entrypoint mainnet-beta.solana.com:8001 --entrypoint entrypoint.mainnet-beta.solana.com:8001 --entrypoint entrypoint2.mainnet-beta.solana.com:8001 --entrypoint entrypoint3.mainnet-beta.solana.com:8001 --entrypoint entrypoint4.mainnet-beta.solana.com:8001 --entrypoint entrypoint5.mainnet-beta.solana.com:8001 --no-genesis-fetch --no-snapshot-fetch --entrypoint entrypoint.mainnet-beta.solana.com:8001 --entrypoint entrypoint2.mainnet-beta.solana.com:8001 --entrypoint entrypoint3.mainnet-beta.solana.com:8001 --entrypoint entrypoint4.mainnet-beta.solana.com:8001 --entrypoint entrypoint5.mainnet-beta.solana.com:8001")
ERROR: apport (pid 783652) Tue Aug 22 17:45:03 2023: executable does not belong to a package, ignoring

addr2line -e /home/sol/.local/share/solana/install/active_release/bin/solana-validator --functions --demangle 0x22e5000

core::ptr::drop_in_place<[alloc::sync::Arc<tokio::util::slab::Page<tokio::runtime::io::scheduled_io::ScheduledIo>>; 19]>
tokio.948cf0d3-cgu.9:?

2023-09-11 meyebro

kernel: traps: solBstoreProc01[2476276] general protection fault ip:5636c60cf361 sp:7fa015231058 error:0 in solana-validator[5636c3e58000+232c000]

$ addr2line -e ... --functions --demangle  0x232c000
<tokio::net::tcp::split::ReadHalf as tokio::io::async_read::AsyncRead>::poll_read
??:?

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 15 (9 by maintainers)

Most upvoted comments

I’ve had pretty good success reproducing this. I compiled a binary w debug symbols and expect to be able to repro in the next day or so. Once I have a core dump w debug symbols it’s GG.

fixed in 1.16.14

uname -srv
sol@LuckyStake:~$ uname -srv
Linux 5.15.0-67-lowlatency #74-Ubuntu SMP PREEMPT Wed Feb 22 15:27:12 UTC 2023