iree: Shutdown deadlock in `iree_task_worker_deinitialize` -> `iree_notification_await` as the thread is already gone

This reproduces every ~1000 runs of dylib unit tests:

ctest -j20 -R iree/hal/dylib/cts/dylib --repeat-until-fail 1000 --output-on-failure

The test just hangs. Attaching GDB to it, I get this stack. According to GDB there is no other thread anymore (other than a TSan-internal thread) — this is a single-threaded program at this point.

So iree_task_worker_deinitialize is awating a notification,

  if (worker->thread) {
    iree_notification_await(&worker->state_notification,
                            (iree_condition_fn_t)iree_task_worker_is_zombie,
                            worker, iree_infinite_timeout());
  }

but it will never come as there is no other thread left to send it.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

it is looking ok in a ctest run with 1000 repeats and it certainly matches my understanding, now that i understand the above explanation, that it fixes a specific issue in the existing code (as explained in the comment added by that pr).