openenclave: InitializeEnclave on Windows occasionally fails due to SGX_UNMASKED_EVENT
Every so often a test run on Windows will fail during the call to Windows’ InitializeEnclave() system call with the underlying SGX EINIT error SGX_UNMASKED_EVENT (0x80). We need to either mitigate this with retry logic, or ensure that oe_sgx_initialize_enclave() or the underlying Windows InitializeEnclave() methods inhibit the unmasked events.
The Intel SDM vol 3D states:
Periodically, EINIT polls for certain asynchronous events. If such an event is detected, it completes with failure code (ZF=1 and RAX = SGX_UNMASKED_EVENT), and RIP is incremented to point to the next instruction. These events includes external interrupts, non-maskable interrupts, system-management interrupts, machine checks, INIT signals, and the VMX-preemption timer. EINIT does not fail if the pending event is inhibited (e.g., external interrupts could be inhibited due to blocking by MOV SS blocking or by STI).
OpenEnclave-Bors#865 has a repro of this:
test 73
Start 73: tests/pthread
73: Test command: D:\Jenkins\workspace\Bors_staging\build\tests\thread\host\Release\thread_host.exe "D:/Jenkins/workspace/Bors_staging/build/tests/thread/host/pthread_enc"
73: Test timeout computed to be: 10000000
73: : d:\jenkins\workspace\bors_staging\tests\thread\host\host.cpp(306): error: oe_create_thread_enclave(): result=21
73: 18:09:31:000000 tid(0xbac) (H)[ERROR]InitializeEnclave failed (err=0x80) (oe_result_t=OE_PLATFORM_ERROR)[d:\jenkins\workspace\bors_staging\host\sgx\sgxload.c oe_sgx_initialize_enclave:929]
73: 18:09:31:000000 tid(0xbac) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_sgx_build_enclave:577]
73: 18:09:31:000000 tid(0xbac) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_create_enclave:681]
73/101 Test #73: tests/pthread .........................................***Failed 0.08 sec
test 74
Start 74: tests/threadcxx
74: Test command: D:\Jenkins\workspace\Bors_staging\build\tests\threadcxx\host\Release\threadcxx_host.exe "D:/Jenkins/workspace/Bors_staging/build/tests/threadcxx/host/threadcxx_enc"
74: Test timeout computed to be: 10000000
74: : d:\jenkins\workspace\bors_staging\tests\threadcxx\host\host.cpp(208): error: oe_create_threadcxx_enclave(): result=21
74: 18:09:31:000000 tid(0x1c4) (H)[ERROR]InitializeEnclave failed (err=0x80) (oe_result_t=OE_PLATFORM_ERROR)[d:\jenkins\workspace\bors_staging\host\sgx\sgxload.c oe_sgx_initialize_enclave:929]
74: 18:09:31:000000 tid(0x1c4) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_sgx_build_enclave:577]
74: 18:09:31:000000 tid(0x1c4) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_create_enclave:681]
74/101 Test #74: tests/threadcxx .......................................***Failed 0.11 sec
test 75
Start 75: tests/thread_local
76: Test command: D:\Jenkins\workspace\Bors_staging\build\tests\thread_local\host\Release\thread_local_host.exe "D:/Jenkins/workspace/Bors_staging/build/tests/thread_local/host/thread_local_enc_exported" "--exported-thread-locals"
76: Test timeout computed to be: 10000000
76: : d:\jenkins\workspace\bors_staging\tests\thread_local\host\host.cpp(84): error: oe_create_enclave(): result=21
76: 18:09:31:000000 tid(0x1458) (H)[ERROR]InitializeEnclave failed (err=0x80) (oe_result_t=OE_PLATFORM_ERROR)[d:\jenkins\workspace\bors_staging\host\sgx\sgxload.c oe_sgx_initialize_enclave:929]
76: 18:09:31:000000 tid(0x1458) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_sgx_build_enclave:577]
76: 18:09:31:000000 tid(0x1458) (H)[ERROR]:OE_PLATFORM_ERROR[d:\jenkins\workspace\bors_staging\host\sgx\create.c oe_create_enclave:681]
76/101 Test #76: tests/thread_local_exported ...........................***Failed 0.09 sec
This has also been previously reported in #1476
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 17 (17 by maintainers)
Completed fix and testing in progress and will issue PR later tonight if 4 Jenkins CI runs come out clean.
Haven’t observed any random CI failures on Windows since the merge. Nice!
https://oe-jenkins.eastus.cloudapp.azure.com/blue/organizations/jenkins/Bors/activity/
Tagging @ionutbalutoiu for a heads up about this
Also seen in bors run 866 … It seems like we’re hitting this pretty often in CI/CD.