CCF: code_update_test fails on CI
We have occasional failures in the code_update_test
during CI runs, which I believe has started recently:
https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=22126&view=logs&j=8f3dc89c-3708-5926-47e7-27120a268dab&t=efdbc34d-c6b6-5dc6-e05c-1751d4c47c20&l=16704 https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=22138&view=logs&j=8f3dc89c-3708-5926-47e7-27120a268dab&t=efdbc34d-c6b6-5dc6-e05c-1751d4c47c20&l=16789
Both of these are in SGX Debug. The latter has a segfault node crash, while the former doesn’t. Both appear to have a failure to establish a node-to-node channel:
62: 10:15:33.702 | ERROR | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:26.927312Z -0.001 0 [fail ] ../src/node/channels.h:781 | Node channel with 22176ee69ffcc7b157e9ef4b154b0235e04f47f61205b5cb6471692489e0b61f cannot receive authenticated message: not yet established, status=0
62: 10:15:33.702 | ERROR | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:26.927339Z -0.001 0 [fail ] ../src/consensus/aft/raft.h:679 | Exception in void aft::Aft<consensus::LedgerEnclave, ccf::NodeToNode, ccf::Snapshotter>::recv_message(const ccf::NodeId &, OArray &&) [LedgerProxy = consensus::LedgerEnclave, ChannelProxy = ccf::NodeToNode, SnapshotterProxy = ccf::Snapshotter]
62: 10:15:33.702 | ERROR | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:27.729962Z -0.001 0 [fail ] ../src/node/channels.h:781 | Node channel with 8c52ba6383d40b50af43c38c87f85d79a754a5af694da0a056bf53a15acfd115 cannot receive authenticated message: not yet established, status=0
62: 10:15:33.702 | ERROR | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:27.730003Z -0.001 0 [fail ] ../src/consensus/aft/raft.h:679 | Exception in void aft::Aft<consensus::LedgerEnclave, ccf::NodeToNode, ccf::Snapshotter>::recv_message(const ccf::NodeId &, OArray &&) [LedgerProxy = consensus::LedgerEnclave, ChannelProxy = ccf::NodeToNode, SnapshotterProxy = ccf::Snapshotter]
62: 10:15:33.702 | ERROR | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:29.288830Z -0.001 0 [fail ] ../src/node/channels.h:781 | Node channel with d2039e880d33f89fea67a3e3c59a8470ccf5aa116e3ec6f2cb7ba5c102391207 cannot receive authenticated message: not yet established, status=0
62: 10:15:33.702 | ERROR | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:29.288864Z -0.001 0 [fail ] ../src/consensus/aft/raft.h:679 | Exception in void aft::Aft<consensus::LedgerEnclave, ccf::NodeToNode, ccf::Snapshotter>::recv_message(const ccf::NodeId &, OArray &&) [LedgerProxy = consensus::LedgerEnclave, ChannelProxy = ccf::NodeToNode, SnapshotterProxy = ccf::Snapshotter]
62: 10:15:33.702 | ERROR | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:31.305218Z -0.002 0 [fail ] ../src/node/channels.h:781 | Node channel with 8c52ba6383d40b50af43c38c87f85d79a754a5af694da0a056bf53a15acfd115 cannot receive authenticated message: not yet established, status=0
62: 10:15:33.702 | ERROR | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:31.305244Z -0.002 0 [fail ] ../src/consensus/aft/raft.h:679 | Exception in void aft::Aft<consensus::LedgerEnclave, ccf::NodeToNode, ccf::Snapshotter>::recv_message(const ccf::NodeId &, OArray &&) [LedgerProxy = consensus::LedgerEnclave, ChannelProxy = ccf::NodeToNode, SnapshotterProxy = ccf::Snapshotter]
62: 10:15:33.702 | ERROR | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:32.015973Z -0.001 0 [fail ] ../src/node/channels.h:781 | Node channel with 22176ee69ffcc7b157e9ef4b154b0235e04f47f61205b5cb6471692489e0b61f cannot receive authenticated message: not yet established, status=0
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (15 by maintainers)
I just managed to reproduce the original problem in a debug build running the code update test in virtual mode, but I still don’t have a proper backtrace. It looks like
uv_close
is indeed called a second time, on a handle that’s already in the closing state. There are only two calls touv_close
that are not protected byif (uv_is_closing()) ...
and I’m trying to get a proper backtrace to find out how we get there.Yeah, there’s at least one more bug in the channel destruction/recreation which would have the symptom that messages don’t get sent the right way. Working on it.