bitcoin: Intermittent (and rare) unit test failure in blockfilter_index_tests/blockfilter_index_initial_sync

Intermittent (and likely super-rare) unit test failure in blockfilter_index_tests/blockfilter_index_initial_sync:

Running 352 test cases...
test/blockfilter_index_tests.cpp(149): fatal error: 
  in "blockfilter_index_tests/blockfilter_index_initial_sync":
  critical check time_start + timeout_ms > GetTimeMillis() has failed
pure virtual method called
terminate called without an active exception
unknown location(0): fatal error: in 
  "blockfilter_index_tests/blockfilter_index_initial_sync": 
  signal: SIGABRT (application abort requested)

Scheduling non-determinism that could be handled more gracefully?

Possibly related:

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 16 (16 by maintainers)

Most upvoted comments

FWIW I had the same issue locally (#22614)

This remains an issue and pops up with regularity on our Jenkins CI, which continuously runs master.

Here are the relevant error snippets:

2021-03-04T10:12:28.684510Z [basic block filter index] [index/base.cpp:163] [ThreadSync] Syncing basic block filter index with block chain from height 0
test/blockfilter_index_tests.cpp(140): [4;31;49mfatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": critical check time_start + timeout_ms > GetTimeMillis() has failed[0;39;49m
==2949==WARNING: MemorySanitizer: use-of-uninitialized-value
...(cut for brevity)...
SUMMARY: MemorySanitizer: use-of-uninitialized-value /var/lib/jenkins/workspace/bitcoin-depends_sanitizers_memory_MSan_focal/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/src/./streams.h:439:13 in CDataStream::Xor(std::__1::vector<unsigned char, std::__1::allocator<unsigned char> > const&)
Exiting
make[3]: *** [Makefile:17137: test/blockfilter_index_tests.cpp.test] Error 1
make[3]: *** Waiting for unfinished jobs....

Over the last 30 days, we’ve seen this 66 of 1831 runs (3.6%) and 11 of 419 (2.6%) over the last week.

This OOME event may be responsible for a portion of the cirrus crashes, which show up as a generic “agent crashed” in the logs.

(Thanks to @evanbaer and @MarcoFalke for tracking this down)

I haven’t investigated that but here is the output from a failing one:

2020-05-16T05:59:57.550904Z     - Index writing: 1.94ms [0.40s (2.03ms/blk)]                                                                                                                                                                                                                                                    [0/1710]
2020-05-16T05:59:57.551566Z     - Callbacks: 0.67ms [0.12s (0.61ms/blk)]
2020-05-16T05:59:57.552209Z BlockChecked: block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d state=Valid
2020-05-16T05:59:57.552852Z   - Connect total: 7.13ms [1.38s (6.93ms/blk)]
2020-05-16T05:59:57.553500Z   - Flush: 0.65ms [0.14s (0.70ms/blk)]
2020-05-16T05:59:57.554153Z   - Writing chainstate: 0.65ms [0.17s (0.84ms/blk)]
2020-05-16T05:59:57.554895Z Blockpolicy estimates updated by 0 of 0 block txs, since last block 0 of 0 tracked, mempool map size 0, max target 0 from current
2020-05-16T05:59:57.555586Z UpdateTip: new best=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d height=99 version=0x20000000 log2_work=7.6438562 tx=100 date='2020-05-16T06:00:11Z' progress=1.000000 cache=0.0MiB(99txo)
2020-05-16T05:59:57.556231Z   - Connect postprocess: 2.08ms [0.43s (2.14ms/blk)]
2020-05-16T05:59:57.556877Z - Connect block: 10.51ms [2.11s (10.62ms/blk)]
2020-05-16T05:59:57.557687Z Checking mempool with 0 transactions and 0 inputs
2020-05-16T05:59:57.571535Z Enqueuing BlockConnected: block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d block height=99
2020-05-16T05:59:57.572202Z Enqueuing UpdatedBlockTip: new block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d fork block hash=5baaa452c43885d714369586561867ad382e55ed27699fa82676c33495e14125 (in IBD=false)
2020-05-16T05:59:57.572970Z CreateNewBlock(): block weight: 904 txs: 0 fees: 0 sigops 400
2020-05-16T05:59:57.573985Z     - Sanity checks: 0.01ms [0.00s (0.01ms/blk)]
2020-05-16T05:59:57.574644Z     - Fork checks: 0.67ms [0.31s (1.55ms/blk)]
2020-05-16T05:59:57.575301Z       - Connect 1 transactions: 0.65ms (0.648ms/tx, 0.000ms/txin) [0.29s (1.46ms/blk)]
2020-05-16T05:59:57.575962Z     - Verify 0 txins: 1.32ms (0.000ms/txin) [0.53s (2.63ms/blk)]
2020-05-16T05:59:57.576603Z CreateNewBlock() packages: 0.01ms (0 packages, 0 updated descendants), validity: 3.65ms (total 3.66ms)
2020-05-16T05:59:57.577353Z NewPoWValidBlock: block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5
2020-05-16T05:59:57.579751Z   - Load block from disk: 0.00ms [0.00s]
2020-05-16T05:59:57.580721Z     - Sanity checks: 0.01ms [0.00s (0.01ms/blk)]
2020-05-16T05:59:57.581374Z     - Fork checks: 0.66ms [0.31s (1.55ms/blk)]
2020-05-16T05:59:57.582022Z       - Connect 1 transactions: 0.64ms (0.644ms/tx, 0.000ms/txin) [0.29s (1.45ms/blk)]
2020-05-16T05:59:57.582672Z     - Verify 0 txins: 1.30ms (0.000ms/txin) [0.53s (2.62ms/blk)]
2020-05-16T05:59:57.584895Z     - Index writing: 2.21ms [0.41s (2.02ms/blk)]
2020-05-16T05:59:57.585539Z     - Callbacks: 0.66ms [0.12s (0.61ms/blk)]
2020-05-16T05:59:57.586168Z BlockChecked: block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 state=Valid
2020-05-16T05:59:57.587934Z   - Connect total: 8.19ms [1.39s (6.90ms/blk)]
2020-05-16T05:59:57.588569Z   - Flush: 0.64ms [0.14s (0.70ms/blk)]
2020-05-16T05:59:57.604022Z   - Writing chainstate: 15.44ms [0.18s (0.91ms/blk)]
2020-05-16T05:59:57.604763Z Blockpolicy estimates updated by 0 of 0 block txs, since last block 0 of 0 tracked, mempool map size 0, max target 0 from current
2020-05-16T05:59:57.605454Z UpdateTip: new best=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 height=100 version=0x20000000 log2_work=7.6582115 tx=101 date='2020-05-16T06:00:11Z' progress=1.000000 cache=0.0MiB(100txo)
2020-05-16T05:59:57.606099Z   - Connect postprocess: 2.09ms [0.43s (2.13ms/blk)]
2020-05-16T05:59:57.606740Z - Connect block: 26.36ms [2.14s (10.65ms/blk)]
2020-05-16T05:59:57.607554Z Checking mempool with 0 transactions and 0 inputs
2020-05-16T05:59:57.608496Z Enqueuing BlockConnected: block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 block height=100
2020-05-16T05:59:57.609144Z Enqueuing UpdatedBlockTip: new block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 fork block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d (in IBD=false)
2020-05-16T05:59:57.613390Z Opened LevelDB successfully
2020-05-16T05:59:57.614711Z Using obfuscation key for /tmp/test_common_Bitcoin Core/8452f2eaa126750ba2992d9a174f267010c49d1332b33956fb3e0963cd70821e/regtest/indexes/blockfilter/basic/db: 0000000000000000
2020-05-16T05:59:57.616968Z BlockConnected: block hash=5baaa452c43885d714369586561867ad382e55ed27699fa82676c33495e14125 block height=98
2020-05-16T05:59:57.618456Z UpdatedBlockTip: new block hash=5baaa452c43885d714369586561867ad382e55ed27699fa82676c33495e14125 fork block hash=79fd5be16c059ff46e8eb70b8b6a37e1b71f4788059ad054941a1f5445ea90f6 (in IBD=false)
2020-05-16T05:59:57.619685Z BlockConnected: block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d block height=99
2020-05-16T05:59:57.620941Z UpdatedBlockTip: new block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d fork block hash=5baaa452c43885d714369586561867ad382e55ed27699fa82676c33495e14125 (in IBD=false)
2020-05-16T05:59:57.622205Z BlockConnected: block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 block height=100
2020-05-16T05:59:57.623487Z UpdatedBlockTip: new block hash=7eddde99a72607cda3cfac83dfbce8cb254693ca76fe0e28d85b01612c7162a5 fork block hash=21b6fd8c6e7d845a5b4959b0c3e439387f6e8f6b98e052140294871356da1c3d (in IBD=false)
2020-05-16T05:59:57.649646Z basic block filter index thread start
2020-05-16T05:59:57.650334Z Syncing basic block filter index with block chain from height 0
test/blockfilter_index_tests.cpp(140): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": critical check time_start + timeout_ms > GetTimeMillis() has failed
pure virtual method called
terminate called without an active exception
unknown location(0): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": signal: SIGABRT (application abort requested)

After running the loop above for 14 hours I hit the bug using rr:

2020-05-16T05:59:59.184771Z Syncing basic block filter index with block chain from height 0
test/blockfilter_index_tests.cpp(140): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": critical check time_start + timeout_ms > GetTimeMillis() has failed
pure virtual method called
terminate called without an active exception
unknown location(0): fatal error: in "blockfilter_index_tests/blockfilter_index_initial_sync": signal: SIGABRT (application abort requested)
test/blockfilter_index_tests.cpp(140): last checkpoint
test/blockfilter_index_tests.cpp(105): Leaving test case "blockfilter_index_initial_sync"; testing time: 334500us

Now we can replay the failing test execution deterministically using rr replay test_bitcoin_chaos/.

@MarcoFalke Try this:

$ git clone https://github.com/bitcoin/bitcoin
$ cd bitcoin/
$ ./autogen.sh
$ ./configure --with-incompatible-bdb
$ make
$ git clone https://github.com/mozilla/rr
$ cd rr/
$ ./configure
$ make
$ cd ../
$ rm -rf test_bitcoin_chaos
$ while rr/bin/rr record --chaos -o test_bitcoin_chaos src/test/test_bitcoin -l test_suite -t blockfilter_index_tests -- DEBUG_LOG_OUT; do
      rm -rf test_bitcoin_chaos
  done

@MarcoFalke If you want to be able to hit this condition deterministically you can use rr’s chaos mode. It is gold for finding intermittent issues in test cases. I’ve never heard it discussed by other Core contributors but I use it extensively 😃