openexr: Threadpool deadlocks during shutdown on Windows

When using C exit function, the IlmThreadpool deadlocks during shutdown on Windows.

This is happening because the worker threads have not run yet (and so have not posted on the “started” semaphore) when they get killed. During shutdown, that semaphore deadlocks as it waits for N threads (but it might not have been posted on all workers) This is especially noticeable when oversubscribing threads (for example 10 threads on a 2 core machine).

I have never seen it deadlock when calling the exit function from the main thread (though not sure if that means it cannot happen or just random coinsidence) and Windows seems to be the only platform affected.

This was initially discovered when trying to get the OpenImageIO test suite to run on Windows yet the following code can be used to reproduce (you might have to run it a couple times because, well threading 😃 )

int main(int argc, char** argv)
{
	std::thread t([]() {exit(-1);	});
	Imf::setGlobalThreadCount(5);
	t.join();
}

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 52 (37 by maintainers)

Commits related to this issue

Most upvoted comments

I would probably add an additional if (_data.hasThreads) statement around the loop (though this might not strictly be needed).

About bulletproofing, I don’t think that it needs any additional coverage for edge cases (like 0 or 10000 threads). The only real reason I see it failing if for example new DefaultWorkerThread would fail (out of memory?) though that would be rather catastrophic anyway. If the threads themselves are not able to start, joinable should be able to catch this