iree: Shutdown deadlock in `iree_task_worker_deinitialize` -> `iree_notification_await` as the thread is already gone
This reproduces every ~1000 runs of dylib
unit tests:
ctest -j20 -R iree/hal/dylib/cts/dylib --repeat-until-fail 1000 --output-on-failure
The test just hangs. Attaching GDB to it, I get this stack. According to GDB there is no other thread anymore (other than a TSan-internal thread) — this is a single-threaded program at this point.
So iree_task_worker_deinitialize
is awating a notification,
if (worker->thread) {
iree_notification_await(&worker->state_notification,
(iree_condition_fn_t)iree_task_worker_is_zombie,
worker, iree_infinite_timeout());
}
but it will never come as there is no other thread left to send it.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21 (10 by maintainers)
it is looking ok in a ctest run with 1000 repeats and it certainly matches my understanding, now that i understand the above explanation, that it fixes a specific issue in the existing code (as explained in the comment added by that pr).