openenclave: InitializeEnclave on Windows occasionally fails due to SGX_UNMASKED_EVENT

Every so often a test run on Windows will fail during the call to Windows’ InitializeEnclave() system call with the underlying SGX EINIT error SGX_UNMASKED_EVENT (0x80). We need to either mitigate this with retry logic, or ensure that oe_sgx_initialize_enclave() or the underlying Windows InitializeEnclave() methods inhibit the unmasked events.

The Intel SDM vol 3D states:

Periodically, EINIT polls for certain asynchronous events. If such an event is detected, it completes with failure code (ZF=1 and RAX = SGX_UNMASKED_EVENT), and RIP is incremented to point to the next instruction. These events includes external interrupts, non-maskable interrupts, system-management interrupts, machine checks, INIT signals, and the VMX-preemption timer. EINIT does not fail if the pending event is inhibited (e.g., external interrupts could be inhibited due to blocking by MOV SS blocking or by STI).

OpenEnclave-Bors#865 has a repro of this:

test 73
        Start  73: tests/pthread

73: Test command: D:\Jenkins\workspace\Bors_staging\build\tests\thread\host\Release\thread_host.exe "D:/Jenkins/workspace/Bors_staging/build/tests/thread/host/pthread_enc"
73: Test timeout computed to be: 10000000
73: : d:\jenkins\workspace\bors_staging\tests\thread\host\host.cpp(306): error: oe_create_thread_enclave(): result=21
73: 18:09:31:000000 tid(0xbac) (H)[ERROR]InitializeEnclave failed (err=0x80) (oe_result_t=OE_PLATFORM_ERROR)[d:\jenkins\workspace\bors_staging\host\sgx\sgxload.c oe_sgx_initialize_enclave:929]
73: 18:09:31:000000 tid(0xbac) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_sgx_build_enclave:577]
73: 18:09:31:000000 tid(0xbac) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_create_enclave:681]
 73/101 Test  #73: tests/pthread .........................................***Failed    0.08 sec
test 74
        Start  74: tests/threadcxx

74: Test command: D:\Jenkins\workspace\Bors_staging\build\tests\threadcxx\host\Release\threadcxx_host.exe "D:/Jenkins/workspace/Bors_staging/build/tests/threadcxx/host/threadcxx_enc"
74: Test timeout computed to be: 10000000
74: : d:\jenkins\workspace\bors_staging\tests\threadcxx\host\host.cpp(208): error: oe_create_threadcxx_enclave(): result=21
74: 18:09:31:000000 tid(0x1c4) (H)[ERROR]InitializeEnclave failed (err=0x80) (oe_result_t=OE_PLATFORM_ERROR)[d:\jenkins\workspace\bors_staging\host\sgx\sgxload.c oe_sgx_initialize_enclave:929]
74: 18:09:31:000000 tid(0x1c4) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_sgx_build_enclave:577]
74: 18:09:31:000000 tid(0x1c4) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_create_enclave:681]
 74/101 Test  #74: tests/threadcxx .......................................***Failed    0.11 sec
test 75
        Start  75: tests/thread_local

76: Test command: D:\Jenkins\workspace\Bors_staging\build\tests\thread_local\host\Release\thread_local_host.exe "D:/Jenkins/workspace/Bors_staging/build/tests/thread_local/host/thread_local_enc_exported" "--exported-thread-locals"
76: Test timeout computed to be: 10000000
76: : d:\jenkins\workspace\bors_staging\tests\thread_local\host\host.cpp(84): error: oe_create_enclave(): result=21
76: 18:09:31:000000 tid(0x1458) (H)[ERROR]InitializeEnclave failed (err=0x80) (oe_result_t=OE_PLATFORM_ERROR)[d:\jenkins\workspace\bors_staging\host\sgx\sgxload.c oe_sgx_initialize_enclave:929]
76: 18:09:31:000000 tid(0x1458) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_sgx_build_enclave:577]
76: 18:09:31:000000 tid(0x1458) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_create_enclave:681]
 76/101 Test  #76: tests/thread_local_exported ...........................***Failed    0.09 sec

This has also been previously reported in #1476

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 17 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Completed fix and testing in progress and will issue PR later tonight if 4 Jenkins CI runs come out clean.

Haven’t observed any random CI failures on Windows since the merge. Nice!

https://oe-jenkins.eastus.cloudapp.azure.com/blue/organizations/jenkins/Bors/activity/

Tagging @ionutbalutoiu for a heads up about this

Also seen in bors run 866 … It seems like we’re hitting this pretty often in CI/CD.