bitcoin: Intermittent failures in interface_usdt_mempool.py

https://cirrus-ci.com/task/5779522121891840

 node0 2023-03-31T11:45:58.812384Z (mocktime: 2023-04-14T11:46:01Z) [http] [httpserver.cpp:257] [http_request_cb] [http] Received a POST request for / from 127.0.0.1:38238 
 node0 2023-03-31T11:45:58.812543Z (mocktime: 2023-04-14T11:46:01Z) [httpworker.2] [rpc/request.cpp:179] [parse] [rpc] ThreadRPCServer method=getpeerinfo user=__cookie__ 
 test  2023-03-31T11:45:58.814000Z TestFramework (INFO): Hooking into mempool:rejected tracepoint... 
 test  2023-03-31T11:45:59.798000Z TestFramework (INFO): Sending invalid transaction... 
 test  2023-03-31T11:45:59.799000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:15944: msg_tx(tx=CTransaction(nVersion=2 vin=[CTxIn(prevout=COutPoint(hash=76b2ca9a8c3ffa3c454db9591dfaf5c4979926154e33a5bef67412ea0eaa16c5 n=0) scriptSig= nSequence=0)] vout=[CTxOut(nValue=49.99968800 scriptPubKey=51202913b252fe537830f843bfdc5fa7d20ba48639a87c86ff837b92d083c55ad7c1)] wit=CTxWitness(CScriptWitness(51,c00000000000000000000000000000000000000000000000000000000000000001)) nLockTime=0)) 
 test  2023-03-31T11:45:59.799000Z TestFramework.p2p (DEBUG): Send message to 127.0.0.1:15944: msg_ping(nonce=00000002) 
 node0 2023-03-31T11:45:59.799697Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [net_processing.cpp:3169] [ProcessMessage] [net] received: tx (133 bytes) peer=0 
 node0 2023-03-31T11:45:59.800472Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [txmempool.cpp:658] [check] [mempool] Checking mempool with 2 transactions and 2 inputs 
 node0 2023-03-31T11:45:59.800746Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [net_processing.cpp:4174] [ProcessMessage] [mempoolrej] c917a71e43b825756c1d45985fcc87f8c245db0654f5eeb000a92dc2cccb1d6c from peer=0 was not accepted: min relay fee not met, 0 < 104 
 node0 2023-03-31T11:45:59.800913Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [net_processing.cpp:3169] [ProcessMessage] [net] received: ping (8 bytes) peer=0 
 node0 2023-03-31T11:45:59.800953Z (mocktime: 2023-04-14T11:46:01Z) [msghand] [net.cpp:2841] [PushMessage] [net] sending pong (8 bytes) peer=0 
 test  2023-03-31T11:45:59.801000Z TestFramework.p2p (DEBUG): Received message from 127.0.0.1:15944: msg_pong(nonce=00000002) 
 node0 2023-03-31T11:45:59.849969Z (mocktime: 2023-04-14T11:46:01Z) [http] [httpserver.cpp:257] [http_request_cb] [http] Received a POST request for / from 127.0.0.1:38238 
 test  2023-03-31T11:45:59.850000Z TestFramework (INFO): Polling buffer... 
 node0 2023-03-31T11:45:59.850174Z (mocktime: 2023-04-14T11:46:01Z) [httpworker.3] [rpc/request.cpp:179] [parse] [rpc] ThreadRPCServer method=getrawmempool user=__cookie__ 
 test  2023-03-31T11:45:59.898000Z TestFramework (INFO): Ensuring mempool:rejected event was handled successfully... 
 test  2023-03-31T11:45:59.898000Z TestFramework (ERROR): Assertion failed 
                                   Traceback (most recent call last):
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 132, in main
                                       self.run_test()
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 339, in run_test
                                       self.rejected_test()
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 323, in rejected_test
                                       assert_equal(EXPECTED_REJECTED_EVENTS, handled_rejected_events)
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 56, in assert_equal
                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
                                   AssertionError: not(1 == 0)
 test  2023-03-31T11:45:59.899000Z TestFramework (DEBUG): Closing down network thread 
 test  2023-03-31T11:45:59.949000Z TestFramework (INFO): Stopping nodes 
 test  2023-03-31T11:45:59.950000Z TestFramework.node0 (DEBUG): Stopping node 

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 33 (33 by maintainers)

Commits related to this issue

Most upvoted comments

Maybe it will ā€œfix itselfā€ after #27360 ?

Unfortunately, it didnā€™t (https://github.com/bitcoin/bitcoin/runs/13206824577 from master after #27360 was merged)

Have seen this a couple times by now.

cc @0xB10C @virtu

neither tx nor result go out of scope before itā€™s called

I donā€™t think the lifetime of result is relevant here. GetRejectReason() gives birth to a fresh temporary copy of the string, having its own lifetime.

All that said, itā€™s obviously still going wrong somehowā€¦

Yeah, I am just guessing that this is a lifetime issue. It would be good to rule this out.

AFAIU the macros themselves donā€™t copy any data, rather just define a static probe point. Any arguments passed to them just get incorporated into the assembly code. The tracing program (BPF) if loaded into the kernel can run code at the trace points. Here it can choose to, and AFAIU does in our case with BPF, copy the data into a different kernel space map when it hits a tracepoint, which can then be read by the USDT probe in our python test.

So the data does get copied but not by the macro, rather by the BPF program.

https://cirrus-ci.com/task/4626519738286080?logs=ci#L3262

 node0 2023-06-29T19:45:37.525879Z (mocktime: 2023-07-13T19:45:38Z) [httpworker.2] [rpc/request.cpp:181] [parse] [rpc] ThreadRPCServer method=getrawmempool user=__cookie__ 
 test  2023-06-29T19:45:37.526000Z TestFramework (INFO): Polling buffer... 
 test  2023-06-29T19:45:37.527000Z TestFramework (INFO): Ensuring mempool:rejected event was handled successfully... 
 test  2023-06-29T19:45:37.527000Z TestFramework (ERROR): Assertion failed 
                                   Traceback (most recent call last):
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 131, in main
                                       self.run_test()
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 329, in run_test
                                       self.rejected_test()
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 310, in rejected_test
                                       assert_equal(event.reason.decode("UTF-8"), "min relay fee not met")
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 57, in assert_equal
                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
                                   AssertionError: not( == min relay fee not met)

https://github.com/bitcoin/bitcoin/pull/27831 improves the logging, probably.

Can it be forced to ā€œreproduceā€ on master by adding a typo to the error string in assert_equal?

see https://github.com/0xB10C/bitcoin/commit/4091d621f16b6316eef5551a56c00bf0e2c0e541 and this CI task https://cirrus-ci.com/task/4909169740873728

@0xB10C and I had a look at this today. Hereā€™s what we found so far:

  1. We believe the issue is caused by one or more of the assertions failing in the handle_rejected_event callback (although we are not sure why they fail).
  2. The test frameworkā€™s assert_equal() communicates failed assertions via exceptions. Unfortunately exceptions raised in bcc callback functions are not properly propagated (Exception ignored on calling ctypes callback function: <function PerfEventArray._open_perf_buffer.<locals>.raw_cb_ at 0xffff83490e50>), so they only appear on stderr, not in the logs. As a consequence, the logs only show that the test case failed, not why it failed.
  3. Because (2) is a general problem with functional tests for tracepoints, Iā€™ll create a PR that fixes the issue for all of them. The best solution we came up with so far is wrapping all assert_equals() in bcc callback functions in try blocks and logging any AssertionError exceptions using self.log(). Once this (or a better solution, preferable one that involves not ignoring exceptions in ctype callback functions, if thatā€™s possible) is merged, the logs should tell us why the mempool:rejected test fails sporadically.

master @ e0a70c5b4f2c691e8d6b507c8ce879f0b0424254. https://cirrus-ci.com/task/5247512164958208:

 node0 2023-05-10T17:29:06.449483Z (mocktime: 2023-05-24T17:29:09Z) [httpworker.1] [rpc/request.cpp:181] [parse] [rpc] ThreadRPCServer method=getrawmempool user=__cookie__ 
 test  2023-05-10T17:29:06.450000Z TestFramework (INFO): Polling buffer... 
 test  2023-05-10T17:29:06.486000Z TestFramework (INFO): Ensuring mempool:rejected event was handled successfully... 
 test  2023-05-10T17:29:06.486000Z TestFramework (ERROR): Assertion failed 
                                   Traceback (most recent call last):
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/test_framework.py", line 132, in main
                                       self.run_test()
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 339, in run_test
                                       self.rejected_test()
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/interface_usdt_mempool.py", line 323, in rejected_test
                                       assert_equal(EXPECTED_REJECTED_EVENTS, handled_rejected_events)
                                     File "/tmp/cirrus-ci-build/ci/scratch/build/bitcoin-x86_64-pc-linux-gnu/test/functional/test_framework/util.py", line 55, in assert_equal
                                       raise AssertionError("not(%s)" % " == ".join(str(arg) for arg in (thing1, thing2) + args))
                                   AssertionError: not(1 == 0)
 test  2023-05-10T17:29:06.487000Z TestFramework (DEBUG): Closing down network thread
...
interface_usdt_mempool.py                              | āœ– Failed  | 5 s