bitcoin: CheckBlockIndex stalls for extremely long time

Stale chain tips can block the RPC and all other responses from the peer for very long time (more than 15 minutes, sometimes hours, or undefinetely). During that time I can see that my local node no longer replies to ANY other peer (e.g. to bitnode.io core). And in process viewers, I can see that there’s a single thread blocking ALL other threads in a tight loop using 100% CPU in a tight loop.

I’ve seen that this is occurs in any db indexer (notably the chainstate indexer during IBD, or the txindexer, or the coinstat db indexer), notably in some critical steps where new blocks are added to the chain (once every about 15 minutes). This seems to happen when a new block contains a transaction validating a better chain, and invalidating other blocks from dead branches. In that case, the ongoin indexer will fail with unchecked conditions in its internal iterator, apparently because it expect the next block to be present and locks it incorrectly, and is not able to release it and retry if that next valid block has been replaced by a better block in the chain.

2022-08-21T19:44:15.875834Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70015, blocks=750474, peer=58, peeraddr=[2604:a880:cad:d0::d9e:f001]:8333 (block-relay-only)
2022-08-21T19:44:45.012022Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 330879
2022-08-21T19:45:16.005414Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 331746
2022-08-21T19:45:47.003036Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 332578
2022-08-21T19:46:18.039548Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 333378
2022-08-21T19:46:49.007385Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 334219
2022-08-21T19:47:20.013586Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 334916
2022-08-21T19:47:51.022263Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 335794
2022-08-21T19:48:22.007479Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 336769
2022-08-21T19:48:53.009529Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 337743
2022-08-21T19:49:24.039053Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 338611
2022-08-21T19:49:55.041510Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 339322
2022-08-21T19:50:26.010515Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 340016
2022-08-21T19:50:57.016185Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 340630
2022-08-21T19:51:28.055804Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 341499
2022-08-21T19:52:08.460256Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=0000000000000000000758f0830f8c7cf41c06740b2004b4186e439b0b989062 height=750475 version=0x2281c000 log2_work=93.689008 tx=758464817 date='2022-08-21T19:50:59Z' progress=1.000000 cache=3.7MiB(27192txo)
2022-08-21T19:52:10.247487Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 341803
2022-08-21T19:52:30.568835Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=00000000000000000005d0b2bbb26ce7bb0eef932eed58853ebe6525ca71c85a height=750476 version=0x20000004 log2_work=93.689019 tx=758465224 date='2022-08-21T19:52:12Z' progress=1.000000 cache=3.9MiB(28964txo)
2022-08-21T19:52:41.785630Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 341814
2022-08-21T19:53:12.001085Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 342683
2022-08-21T19:53:43.027307Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 343432
2022-08-21T19:54:14.012871Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344012
2022-08-21T19:54:45.022063Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344718
2022-08-21T19:54:59.410934Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=000000000000000000098f79590f23c8bb78c3d4d22d1cd4bab3a0e94fc7e6be height=750477 version=0x2ab96000 log2_work=93.689030 tx=758465722 date='2022-08-21T19:54:36Z' progress=1.000000 cache=4.1MiB(30966txo)
2022-08-21T19:55:16.033929Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344913
2022-08-21T19:56:00.985269Z [mapport.cpp:204] [ProcessUpnp] UPnP Port Mapping successful.
2022-08-21T20:16:01.005268Z [mapport.cpp:204] [ProcessUpnp] UPnP Port Mapping successful.

Finally later it may eventually detect that and detect the inconsistency, by eviting some peers.

2022-08-21T20:26:10.234268Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344915
2022-08-21T20:26:10.236286Z [net_processing.cpp:4381] [CheckForStaleTipAndEvictPeers] Potential stale tip detected, will try using extra outbound peer (last tip update: 1871 seconds ago)
2022-08-21T20:26:10.877268Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=194, peeraddr=78.108.108.38:8333 (outbound-full-relay)
2022-08-21T20:26:18.436270Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=195, peeraddr=208.104.92.74:8333 (outbound-full-relay)
2022-08-21T20:26:20.609320Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=196, peeraddr=[2a01:5200:6c:6162:6263:6173:6861:6263]:8333 (outbound-full-relay)
2022-08-21T20:26:22.546436Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=198, peeraddr=141.95.45.187:30034 (outbound-full-relay)
2022-08-21T20:27:18.146310Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=00000000000000000007cb8a63d4c0f24bf0e1c3e6bedbf0436ebab43a3ee690 height=750478 version=0x20800000 log2_work=93.689041 tx=758467003 date='2022-08-21T20:05:24Z' progress=0.999995 cache=5.5MiB(41861txo)
2022-08-21T20:27:20.065326Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344926
2022-08-21T20:27:51.486742Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=00000000000000000006e1bb77ce90948b3050df8ee2143d1913fd583ce3c765 height=750479 version=0x20c00000 log2_work=93.689052 tx=758468307 date='2022-08-21T20:08:38Z' progress=0.999996 cache=6.6MiB(48588txo)
2022-08-21T20:28:03.488188Z [validation.cpp:2441] [UpdateTipLog] UpdateTip: new best=0000000000000000000171f39891841e7128e85f45f386ddde82f65c12dd5773 height=750480 version=0x20400000 log2_work=93.689063 tx=758468706 date='2022-08-21T20:10:40Z' progress=0.999996 cache=6.8MiB(50180txo)
2022-08-21T20:28:05.362170Z [index/base.cpp:169] [ThreadSync] Syncing coinstatsindex with block chain from height 344927
2022-08-21T20:28:05.364170Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70015, blocks=750480, peer=205, peeraddr=116.203.112.73:8333 (outbound-full-relay)
2022-08-21T20:28:05.718559Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=208, peeraddr=[2404:7a85:4161:2b00:49a1:427a:fac:3409]:8333 (outbound-full-relay)
2022-08-21T20:28:08.302936Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=211, peeraddr=185.167.113.59:8333 (outbound-full-relay)
2022-08-21T20:28:21.039390Z [net_processing.cpp:2797] [ProcessMessage] New outbound peer connected: version: 70016, blocks=750480, peer=213, peeraddr=[240b:252:9480:9d00:e65f:1ff:fe10:d767]:8333 (outbound-full-relay)

And here again we see the deadlock occuring for long time in the coindb indexer (note the difference of time between the last too events). And in fact nothing else occured: no resposes to other peers, the coindb indexer was stalled, the RPC service stopped to respond even locally.

It seems then that there are malicious nodes on the P2P network sending bad blocks that “partially pass” the basic security, but can cause BitCoin Core to be blocked and become unresponsive. It can no longer respond even to a basic RPC such as “bitcoin -getinfo”, or “bitcoin -netinfo”

It will not even repond to a “bitcoin-cli stop”: some threads are stopped, but not all, and the indexer thread is still running in its tight loop; if you are in that situation, the only way to recover is to KILL the process entirely, and you’ll get the related index to become invalid as it was not committed, it will have to be reconstructed from zero.

Most often, you have no way to recover (given that the RPC service is not responsive, and even the CTRL+C breaker, or a simple kill -hangup" will not work. You have to do an instant kill (kill -9 on Linux, or manual kill in the Process Viewer on Windows: nothing else will be synced to disk).

If the index was the IBD index, the chainstate can only be recovered by check level 4 (works often, but not necessarily always, but never with check level 3). If it does not work, then you have to rebuild the chainstate entirely.

But for any other index (such as the “txindex” or “coinstats” index, there’s NO check at all with recovery at startup, so the only way to recover is to entirely delete the content of that index directory and rebuild it from zero (and this will take many hours, and may fail again in the middle, due to bad blocks sent by malicious peers and that may have still not been replaced by valid blocks in the main chain).

Apparently these bad blocks seem to come from unapproved forks (possibly implementing SegWit with too large block sizes?), which are not correctly detected and cause the indexer to fail or run into some infinite loops, or some bad signatures using unsupported/snon-standard bytecodes, taking extremely long time to validate or just discard rapidly as invalid.

Such events with bad blocks coming from malicious peers tend to be frequent now. Unfortunately it’s not easy to track which nodes caused that bug as they may not even be connected since long: they have sent their garbage and have disconnected themselves as soon as done, just to crash specific versions of Bitcoin Core not detecting them.

(Note: this is once again occuring with the current unmodified release v23.0 from your official site; it affects both Linux and Windows versions).

Are there blocks that are extremely long to validate or index (taking much more than just about 1 minutes normally needed here to process every logged group of about 15 blocks, such as more than 30 minutes in the 1st break above, but still not completed after several hours at end of the logs above)? Isn’t it one of the reasons why SegWit was asked for, with blocks grouping many signatures, for example those generated by very large miner pools and needing tons of transactions to validate all their micropayments to many participants?


See also Bug https://github.com/bitcoin/bitcoin/issues/25365 (may be related with the same race issue when there’s a new chaintip updated while the indexer is running and iterating from the older chain view)

If this is the cause, the bug is manifested only when you run it on a machine with enough cores to allow more threads to run concurrently. The crashes/hangs I experiment are all runing with 8 cores activates; your test environment may just have 1 or 2 cores configured, most probably not enough to experiment the race issue between concurrent threads, and not experienced by people running it on cheap PC or notebooks, or in cheap/free VMs for Azure and Amazon (if you get just 1 or 2 non dedicated cores).

Note also that synchronization issues in code may even more likely to cause bugs on CPUs running multiple threads per core, or using more complex parallelism (e.g. new Intel processors with power-efficient and performance CPUs, with more complex caching layers, or needing special kernel-initiated firmware mitigations like Retpoline against attacks based on latency response time on caches and with speculative execution models; those possibile attacks and firmware or OS mitigations exist on many processors, and their effect on execution latency may vary, depending on the number of active concurrent threads; if you have low I/O latency on disk storage, this will also increase the likelyhood of race issues caused by improper synchronization. But in all these cases this greatly impacts the leveldb indexers, not properly iterating over the main chain of blocks.)

If you want to experiment it faster, you may want to test it on a machine with many more CPU cores, like newest AMD ThreadRipper (48 concurrent threads!). And then try running multiple indexers running in parallel (the chainstate indexer in IBD, the coinsdbindexer, the txindex, and the coinsdbindex: one of them will crash/hand, or report “data corruption”, but not caused by any hardware issue, but incorrect synchronization in Bitcoin Core’s code).

As well it seems that all bugs like https://github.com/bitcoin/bitcoin/issues/25894 (reported quite often, but always diagnosed in replies" incorrectly as a “hardware issue”, just supposed but never really verified) are also related to the same synchronization issues between any indexer thread and the thread handling any changes of the tips on the chainstate (notably if there’s a change between concurrent forks for new blocks after establishing a consensus. They more likely occurs at end of a trading day, when more blocks are announced for compensating transactions made by major traders (it’s then more frequent to get new blocks emitted more rapidly than just once every 10 minutes during the rest of the day.


Subsidiary question: Is there way to have a knonw list of trustable peers, and a blacklist for bad peers sending fake/corrupted data? Isn’t Bitcoin missing some additional checks of the structure of these blocks (not just the validation of the individual attempted transactions, but basically its assumed format, includnig for complex multipart signature schemes, or peers attempting to force the use of a hard fork with a disappoved BIP proposal, or attemping to use it too early before a valid date and proper nogociation, if they run some specific subchains)?

One remote peer I remarked was always present when I saw crashes/hangs was this one:

<-> type net mping ping send recv txn blk hb addrp addrl age id address version out full ipv6 27 27 4 1 0 1002 1 11 [2001:41d0:303:2b2b::]:8333 70015/Satoshi:0.14.2(bitcore-sl)/UASF-Segwit:1.0(BIP148)/

I’ve seen it several times sending groups of 2 or 3 unconfirmed blocks with large transactions in a row. It is likely a non-upgraded miner node.

(According to WHOIS, is is hosted by OVH in France, and likely on a fiber access; it uses the most antique version of Satoshi I seen for a “full node”, but with local patches trying to enforce BIP148 but with wrong version announcements).

Do I need to list it in my banlist?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 22 (9 by maintainers)

Most upvoted comments

From the developer notes:

Defining `DEBUG_LOCKCONTENTION` adds a "lock" logging category to the logging
RPC that, when enabled, logs the location and duration of each lock contention
to the `debug.log` file.

The `--enable-debug` configure option adds `-DDEBUG_LOCKCONTENTION` to the
compiler flags. You may also enable it manually for a non-debug build by running
configure with `-DDEBUG_LOCKCONTENTION` added to your CPPFLAGS,
i.e. `CPPFLAGS="-DDEBUG_LOCKCONTENTION"`, then build and run bitcoind.

You can then use the `-debug=lock` configuration option at bitcoind startup or
`bitcoin-cli logging '["lock"]'` at runtime to turn on lock contention logging.
It can be toggled off again with `bitcoin-cli logging [] '["lock"]'`.

Is this still an issue with a recent version of Bitcoin Core? If yes, what are the steps to reproduce, what does debug logging and what does gdb say?

Closing for now, but this can be reopened, or a new issue can be filed for unrelated issues.

I experience it also for some time

What do you experience? This issue is quite confusing.

-checkblockindex is a diagnostic tool for the mandatory block index every node maintains. It is debug-only, not part of the bitcoind help and shouldn’t be used under normal circumstances. The checks it does are quite costly, especially when dealing with a long chain of >100k blocks, and especially during IBD because they are executed all the time during that. So they can slow down a node a lot during IBD.

However, -checkblockindex has absolutely nothing to do with having additional optional indexes like txindex or coinstatsindex - the blockindex is very different from the other indexes (which share a common framework amongst themselves).