bitcoin: Intermittent (and rare) unit test failure in blockfilter_index_tests/blockfilter_index_initial_sync
Intermittent (and likely super-rare) unit test failure in blockfilter_index_tests/blockfilter_index_initial_sync
:
Running 352 test cases...
test/blockfilter_index_tests.cpp(149): fatal error:
in "blockfilter_index_tests/blockfilter_index_initial_sync":
critical check time_start + timeout_ms > GetTimeMillis() has failed
pure virtual method called
terminate called without an active exception
unknown location(0): fatal error: in
"blockfilter_index_tests/blockfilter_index_initial_sync":
signal: SIGABRT (application abort requested)
Scheduling non-determinism that could be handled more gracefully?
Possibly related:
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (16 by maintainers)
FWIW I had the same issue locally (#22614)
This remains an issue and pops up with regularity on our Jenkins CI, which continuously runs master.
Here are the relevant error snippets:
Over the last 30 days, we’ve seen this 66 of 1831 runs (3.6%) and 11 of 419 (2.6%) over the last week.
This OOME event may be responsible for a portion of the cirrus crashes, which show up as a generic “agent crashed” in the logs.
(Thanks to @evanbaer and @MarcoFalke for tracking this down)
I haven’t investigated that but here is the output from a failing one:
After running the loop above for 14 hours I hit the bug using
rr
:Now we can replay the failing test execution deterministically using
rr replay test_bitcoin_chaos/
.@MarcoFalke Try this:
@MarcoFalke If you want to be able to hit this condition deterministically you can use
rr
’s chaos mode. It is gold for finding intermittent issues in test cases. I’ve never heard it discussed by other Core contributors but I use it extensively 😃