CCF: code_update_test fails on CI

We have occasional failures in the code_update_test during CI runs, which I believe has started recently:

https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=22126&view=logs&j=8f3dc89c-3708-5926-47e7-27120a268dab&t=efdbc34d-c6b6-5dc6-e05c-1751d4c47c20&l=16704 https://dev.azure.com/MSRC-CCF/CCF/_build/results?buildId=22138&view=logs&j=8f3dc89c-3708-5926-47e7-27120a268dab&t=efdbc34d-c6b6-5dc6-e05c-1751d4c47c20&l=16789

Both of these are in SGX Debug. The latter has a segfault node crash, while the former doesn’t. Both appear to have a failure to establish a node-to-node channel:

62: 10:15:33.702 | ERROR    | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:26.927312Z -0.001 0   [fail ] ../src/node/channels.h:781           | Node channel with 22176ee69ffcc7b157e9ef4b154b0235e04f47f61205b5cb6471692489e0b61f cannot receive authenticated message: not yet established, status=0
62: 10:15:33.702 | ERROR    | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:26.927339Z -0.001 0   [fail ] ../src/consensus/aft/raft.h:679      | Exception in void aft::Aft<consensus::LedgerEnclave, ccf::NodeToNode, ccf::Snapshotter>::recv_message(const ccf::NodeId &, OArray &&) [LedgerProxy = consensus::LedgerEnclave, ChannelProxy = ccf::NodeToNode, SnapshotterProxy = ccf::Snapshotter]
62: 10:15:33.702 | ERROR    | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:27.729962Z -0.001 0   [fail ] ../src/node/channels.h:781           | Node channel with 8c52ba6383d40b50af43c38c87f85d79a754a5af694da0a056bf53a15acfd115 cannot receive authenticated message: not yet established, status=0
62: 10:15:33.702 | ERROR    | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:27.730003Z -0.001 0   [fail ] ../src/consensus/aft/raft.h:679      | Exception in void aft::Aft<consensus::LedgerEnclave, ccf::NodeToNode, ccf::Snapshotter>::recv_message(const ccf::NodeId &, OArray &&) [LedgerProxy = consensus::LedgerEnclave, ChannelProxy = ccf::NodeToNode, SnapshotterProxy = ccf::Snapshotter]
62: 10:15:33.702 | ERROR    | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:29.288830Z -0.001 0   [fail ] ../src/node/channels.h:781           | Node channel with d2039e880d33f89fea67a3e3c59a8470ccf5aa116e3ec6f2cb7ba5c102391207 cannot receive authenticated message: not yet established, status=0
62: 10:15:33.702 | ERROR    | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:29.288864Z -0.001 0   [fail ] ../src/consensus/aft/raft.h:679      | Exception in void aft::Aft<consensus::LedgerEnclave, ccf::NodeToNode, ccf::Snapshotter>::recv_message(const ccf::NodeId &, OArray &&) [LedgerProxy = consensus::LedgerEnclave, ChannelProxy = ccf::NodeToNode, SnapshotterProxy = ccf::Snapshotter]
62: 10:15:33.702 | ERROR    | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:31.305218Z -0.002 0   [fail ] ../src/node/channels.h:781           | Node channel with 8c52ba6383d40b50af43c38c87f85d79a754a5af694da0a056bf53a15acfd115 cannot receive authenticated message: not yet established, status=0
62: 10:15:33.702 | ERROR    | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:31.305244Z -0.002 0   [fail ] ../src/consensus/aft/raft.h:679      | Exception in void aft::Aft<consensus::LedgerEnclave, ccf::NodeToNode, ccf::Snapshotter>::recv_message(const ccf::NodeId &, OArray &&) [LedgerProxy = consensus::LedgerEnclave, ChannelProxy = ccf::NodeToNode, SnapshotterProxy = ccf::Snapshotter]
62: 10:15:33.702 | ERROR    | infra.remote:log_errors:70 - /mnt/build/1/s/build/workspace/code_update_test_2/out: 2021-03-30T10:15:32.015973Z -0.001 0   [fail ] ../src/node/channels.h:781           | Node channel with 22176ee69ffcc7b157e9ef4b154b0235e04f47f61205b5cb6471692489e0b61f cannot receive authenticated message: not yet established, status=0

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

I just managed to reproduce the original problem in a debug build running the code update test in virtual mode, but I still don’t have a proper backtrace. It looks like uv_close is indeed called a second time, on a handle that’s already in the closing state. There are only two calls to uv_close that are not protected by if (uv_is_closing()) ... and I’m trying to get a proper backtrace to find out how we get there.

Yeah, there’s at least one more bug in the channel destruction/recreation which would have the symptom that messages don’t get sent the right way. Working on it.