enkiTS: Deadlock in SemaphoreWait (macOS)
I’ve run into a deadlock in ThreadScheduler::SemaphoreWait.
I call WaitForTask on a reduction operation (very similar to the ParallelSum example) but almost immediately the whole application hangs. When I pause in the debugger I see the call stack below:
semaphore_wait_trap 0x000000019374fe90
_dispatch_sema4_wait 0x00000001935e099c
_dispatch_semaphore_wait_slow 0x00000001935e1050
[Inlined] enki::SemaphoreWait(enki::semaphoreid_t &) TaskScheduler.cpp:1381
enki::TaskScheduler::WaitForTaskCompletion(const enki::ICompletable *, unsigned int) TaskScheduler.cpp:742
enki::TaskScheduler::WaitforTask(const enki::ICompletable *, enki::TaskPriority) TaskScheduler.cpp:1002
...
I did a little poking around and found that I could stop the deadlock from occurring by modifying this line…
(TaskScheduler.cpp, line 630)
bool bCallWakeThreads = bWakeThreads_ && pTask_->m_WaitingForTaskCount.load( std::memory_order_acquire );
to…
bool bCallWakeThreads = bWakeThreads_ && pTask_->m_WaitingForTaskCount.load( std::memory_order_seq_cst );
Notice the change from std::memory_order_acquire to std::memory_order_seq_cst. I’m not sure this is the most optimal fix, but it might give you a clue as to what’s going wrong.
As an experiment, I also tried changing…
dispatch_semaphore_wait( semaphoreid.sem, DISPATCH_TIME_FOREVER );
to…
dispatch_semaphore_wait( semaphoreid.sem, DISPATCH_TIME_NOW );
And with that change alone, I get a hitch, but no deadlock.
If you have any thoughts on this I’d be very grateful to hear. For now, I’m going to make a fork with this change to unblock me. I’ve tried reproducing in a simpler application so I could share it but unfortunately, I wasn’t able to (it must be a gnarly timing thing). I’ve also only been able to reproduce this on the setup below. I tried running the same application on Windows with the same application state but also couldn’t reproduce it.
OS: macOS (Ventura 13.4) Hardware: Macbook Pro (M1 Max)
Thanks very much for your time, any questions feel free to give me a shout.
Update: The repro I have happens pretty much 100% consistently when running from CLion on macOS, for some reason, it’s much harder to recreate when running from command line, not entirely sure why 🙈
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16 (16 by maintainers)
Excellent. I’m merging this into dev then main at the moment.
Many thanks for running this test!
I am a little puzzled as
memory_order_seq_cstis only sequentially consistent with othermemory_order_seq_cstoperations on other threads - yet in this case there no other such operations on other threads except the threadState variable, which is unused by the TaskCompletion thread. So in this casememory_order_seq_cstandmemory_order_acquireshould be the same (since the operation is a load).So I think
memory_order_seq_cstfixing the problem likely as a side effect of the code the compiler creates for this.I’ve pushed one further change which adds changes the
fetch_subonm_RunningCountread-modify-write operations to usememory_order_acq_rel. This should ensure that whenm_WaitingForTaskCountis read by the thread completing the task,m_RunningCounthas been decremented and is visible in theWaitForTaskCompletionthread in the correct order.Thanks - it does seem from the above that there isn’t a deadlock in your code.
It might be useful to see this, even if it can’t reproduce the problem.
One thing to check is that you haven’t deadlocked. There are a number of ways this can occur, but the most common is a
WaitForTaskrunning a task which callsWaitForTaskon the parent. This can be avoided by using task priorities (in both the tasks and also in theWaitForTaskcall to prevent running lower priority tasks) or dependencies (dependencies are much safer and also better performance in general). You should normally be able to see this in a debugger by checking the thread callstacks.Using
std::memory_order_seq_cstmight have fixed this by changing the performance in such a way that the deadlock doesn’t happen. However I’ll also go over them_WaitingForTaskCountmemory order semantics and check whether acquire/release isn’t sufficient here.