bitcoin: IBD stalls permanently with Bitcoin core v23 after "ignoring getheaders from peer=0 because node is in initial block download"
With Bitcoin core v23 (Qt versin on Windows, or with the daemon from commandline), the IBD process can stall indefinitely after receiving a large block:
2022-08-07T13:24:50Z [net.cpp:3043] [CNode] Added connection to 104.248.143.83:8333 peer=0
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending version (102 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:1146] [PushNodeVersion] send version message: version 70016, blocks=239242, them=104.248.143.83:8333, txrelay=1, peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: version (102 bytes) peer=0
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending verack (0 bytes) peer=0
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending getaddr (0 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2754] [ProcessMessage] receive version message: /Satoshi:0.20.1/: version 70015, blocks=748403, us=104.248.143.83:8333, txrelay=1, peer=0, peeraddr=104.248.143.83:8333
2022-08-07T13:24:50Z [timedata.cpp:57] [AddTimeData] added time data, samples 2, offset -1 (+0 minutes)
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: verack (0 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70015, blocks=748403, peer=0, peeraddr=104.248.143.83:8333 (manual)
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending sendheaders (0 bytes) peer=0
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending sendcmpct (9 bytes) peer=0
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending sendcmpct (9 bytes) peer=0
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending ping (8 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:4650] [SendMessages] initial getheaders (319216) to peer=0 (startheight:748403)
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending getheaders (997 bytes) peer=0
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending feefilter (8 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: sendheaders (0 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: sendcmpct (9 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: sendcmpct (9 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: ping (8 bytes) peer=0
2022-08-07T13:24:50Z [net.cpp:3060] [PushMessage] sending pong (8 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: getheaders (1029 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:3223] [ProcessMessage] Ignoring getheaders from peer=0 because node is in initial block download
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: feefilter (8 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:4051] [ProcessMessage] received: feefilter of 0.00001000 BTC/kvB from peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: pong (8 bytes) peer=0
2022-08-07T13:24:50Z [net_processing.cpp:2558] [ProcessMessage] received: headers (162003 bytes) peer=0
Then any other attemps to connect to other peers (or accept incoming connections) stalls permanently (we can waitfior hours: no network activity, no disk activity, but some thread uses 100% of a CPU core, probably in a tight loop generated by a deadlock and infinite attemps to recover from it).
If we try to shutdown, al most all databases thread are closed, the RPC services are closed, But the conenction manager remains locked for about ~15 minutes, not doing anything.
In fact the received headers are not even starting to be parse until AFTER these ~15 minutes. At which time we see that in logs:
(note: I’ve enabled logtimestamps=1
, logsourcelocations=1
and logthreadnames=1
in bitcoin.conf to have more traceable traces; and run it with debug=all
).
It seems that a lock is kept somewhere in “net_processing.cpp” after the ProcessMessage() exists with “Ignoring getheaders from peer=0 because node is in initial block download”
Only when we shut down the node we see instantly:
2022-08-07T13:26:17Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: requestShutdown : Requesting shutdown
2022-08-07T13:26:17Z [rpc/server.cpp:302] [operator()] Interrupting RPC
2022-08-07T13:26:17Z [rpc/server.cpp:314] [operator()] Stopping RPC
2022-08-07T13:26:17Z [init.cpp:371] [OnRPCStopped] RPC stopped.
2022-08-07T13:26:17Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: Running Shutdown in thread
2022-08-07T13:26:17Z [httpserver.cpp:436] [InterruptHTTPServer] Interrupting HTTP server
2022-08-07T13:26:17Z [httprpc.cpp:312] [InterruptHTTPRPC] Interrupting HTTP RPC server
2022-08-07T13:26:17Z [init.cpp:205] [Shutdown] Shutdown: In progress...
2022-08-07T13:26:17Z [httprpc.cpp:317] [StopHTTPRPC] Stopping HTTP RPC server
2022-08-07T13:26:17Z [httpserver.cpp:657] [UnregisterHTTPHandler] Unregistering HTTP handler for / (exactmatch 1)
2022-08-07T13:26:17Z [httpserver.cpp:657] [UnregisterHTTPHandler] Unregistering HTTP handler for /wallet/ (exactmatch 0)
2022-08-07T13:26:17Z [httpserver.cpp:448] [StopHTTPServer] Stopping HTTP server
2022-08-07T13:26:17Z [httpserver.cpp:450] [StopHTTPServer] Waiting for HTTP worker threads to exit
2022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] opencon thread exit
2022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] dnsseed thread exit
2022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] addcon thread exit
2022-08-07T13:26:17Z [httpserver.cpp:291] [ThreadHTTP] Exited http event loop
2022-08-07T13:26:17Z [httpserver.cpp:463] [StopHTTPServer] Waiting for HTTP event thread to exit
2022-08-07T13:26:17Z [httpserver.cpp:475] [StopHTTPServer] Stopped HTTP server
2022-08-07T13:26:17Z [mapport.cpp:210] [ProcessUpnp] UPNP_DeletePortMapping() returned: 0
2022-08-07T13:26:17Z [util/thread.cpp:19] [TraceThread] mapport thread exit
And then we wait for about 15-20 minutes to finally see this: the received block headers are processed, an attempt for sending another requests for headers is emitted, but all peers are finally disconnected (because we were shutting down), and all remaining data in moemory is flushed to disk and the node finally exits correctly.
2022-08-07T13:43:47Z [validation.cpp:3606] [ProcessNewBlockHeaders] Synchronizing blockheaders, height: 321216 (~43.64%)
2022-08-07T13:43:47Z [net_processing.cpp:2198] [ProcessHeadersMessage] more getheaders (321216) to end to peer=0 (startheight:748403)
2022-08-07T13:43:47Z [net.cpp:1262] [CreateNodeFromAcceptedSocket] connection from [2604:d500:4:1::2]:63949 accepted
2022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] net thread exit
2022-08-07T13:43:47Z [net.cpp:3060] [PushMessage] sending getheaders (997 bytes) peer=0
2022-08-07T13:43:47Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=txindex, before=0.0MiB, after=0.0MiB
2022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] msghand thread exit
2022-08-07T13:43:47Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=txindex, before=0.0MiB, after=0.0MiB
2022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] txindex thread exit
2022-08-07T13:43:47Z [random.cpp:520] [SeedPeriodic] Feeding 1009016 bytes of dynamic environment data into RNG
2022-08-07T13:43:47Z [net.cpp:1846] [DumpAddresses] Flushed 6626 addresses to peers.dat 701ms
2022-08-07T13:43:47Z [net.cpp:567] [CloseSocketDisconnect] disconnecting peer=0
2022-08-07T13:43:47Z [net_processing.cpp:1282] [FinalizeNode] Cleared nodestate for peer=0
2022-08-07T13:43:47Z [net.cpp:567] [CloseSocketDisconnect] disconnecting peer=1
2022-08-07T13:43:47Z [net_processing.cpp:1282] [FinalizeNode] Cleared nodestate for peer=1
2022-08-07T13:43:47Z [net.cpp:1846] [DumpAddresses] Flushed 6626 addresses to peers.dat 15ms
2022-08-07T13:43:47Z [util/thread.cpp:19] [TraceThread] scheduler thread exit
2022-08-07T13:43:47Z [validation.cpp:4644] [DumpMempool] Writing 0 unbroadcast transactions to disk.
2022-08-07T13:43:47Z [validation.cpp:4654] [DumpMempool] Dumped mempool: 0s to copy, 0.036892s to dump
2022-08-07T13:43:47Z [policy/fees.cpp:998] [FlushUnconfirmed] Recorded 0 unconfirmed txs from mempool in 0s
2022-08-07T13:43:47Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk started
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk completed (38.02ms)
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk started
2022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=index, before=0.0MiB, after=0.3MiB
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk completed (36.98ms)
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) started
2022-08-07T13:43:48Z [txdb.cpp:161] [BatchWrite] Writing final batch of 0.00 MiB
2022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB
2022-08-07T13:43:48Z [txdb.cpp:163] [BatchWrite] Committed 0 changed transaction outputs (out of 0) to coin database...
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) completed (0.00ms)
2022-08-07T13:43:48Z [validationinterface.cpp:243] [ChainStateFlushed] Enqueuing ChainStateFlushed: block hash=000000000000012a873661c59cd6301ace5a9108786284693eb0e35038713f4d
2022-08-07T13:43:48Z [validationinterface.cpp:243] [operator()] ChainStateFlushed: block hash=000000000000012a873661c59cd6301ace5a9108786284693eb0e35038713f4d
2022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=db, before=0.0MiB, after=0.0MiB
2022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=db, before=0.0MiB, after=0.0MiB
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk started
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block and undo data to disk completed (31.27ms)
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk started
2022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=index, before=0.3MiB, after=0.3MiB
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write block index to disk completed (15.63ms)
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) started
2022-08-07T13:43:48Z [txdb.cpp:161] [BatchWrite] Writing final batch of 0.00 MiB
2022-08-07T13:43:48Z [dbwrapper.cpp:198] [WriteBatch] WriteBatch memory usage: db=chainstate, before=0.0MiB, after=0.0MiB
2022-08-07T13:43:48Z [txdb.cpp:163] [BatchWrite] Committed 0 changed transaction outputs (out of 0) to coin database...
2022-08-07T13:43:48Z [logging/timer.h:57] [Log] FlushStateToDisk: write coins cache to disk (0 coins, 0kB) completed (0.00ms)
2022-08-07T13:43:48Z [validationinterface.cpp:243] [ChainStateFlushed] Enqueuing ChainStateFlushed: block hash=000000000000012a873661c59cd6301ace5a9108786284693eb0e35038713f4d
2022-08-07T13:43:48Z [init.cpp:322] [Shutdown] Shutdown: done
2022-08-07T13:43:48Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: Shutdown finished
2022-08-07T13:43:48Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: ~InitExecutor : Stopping thread
2022-08-07T13:43:48Z [qt/bitcoin.cpp:207] [DebugMessageHandler] GUI: ~InitExecutor : Stopped thread
If we relaunch the node, it loads all data successfully, connects to a new peer=0 and requests another set of headers, then stalls again exactly as above, and we have to shutdown and wait 15-20 minutes exactly the same way as above. A single connection is made instantly to download a single nex set of headers, and then stalls forever. Other connections are added to the list (the UPnP discovery occurs, other peers are also discovered), but will never start. These extra connections are locked again until…
We shutdown the UI and then wait for 15-20 minutes when the RPC services are closed. These pending connections are then open and closed immediately and the UI closes.
Result: we have been able to download a single set of headers at each run (less than 1 minute to start and then seeing the lock after the 1st download of headers, just about 180KB, then no network or disk activity at all dor an infinite time, then shutdown and wait 15-20 minutes).
Expected behavior
IBD should not stall. It should also connect to other peers, and not just to the initial one (the QT interface dispays constantly: “connecting to peers…”. System information
Bitcoin v23 for Windows (from https://bitcoincore.org/en/download/, extracted from the ZIP file, because the current EXE installer does not run on a Windows system enforcing ASLT)
Windows 11 x64 (Intel Core i7), 32 GB RAM
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 54 (18 by maintainers)
So, for whatever reason, your node has marked block 277117 (0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b) incorrectly as invalid. It’s not synchronizing further, because it is treating every peer that hands it block 277177 as malicious. We’ll need to figure out why that is happening. Your comment above about running out of file descriptors seems like a useful lead.
First, we need to know if it is the block on disk itself that is corrupted, or the validation database. It’s likely the latter, but to find out, run
reconsiderblock 0000000000000002cdf7bce4c09c0849f789a9508d812d3ca430429f6213688b
, and report what you see in debug.log at that time.If the blocks themselves are corrupted, you’ll need to use
-reindex
. If it’s just the database that is corrupted,-reindex-chainstate
should suffice. Given that it sounds like you’ve tried the latter several times and that doesn’t help, it’s either a problem with the blocks themselves, or somehow your Bitcoin Core instance is corrupting itself time after time. Looking at thereconsiderblock
output will help figure this out.Misbehaving nodes should never be able to cause Bitcoin Core to incorrectly mark a block as invalid, so no.
No.
That software is for a different currency, with different consensus rules. It will not synchronize with the Bitcoin network/chain/currency.