runtime: `Monitor.TryEnter` sometimes fails even if `Monitor.IsEntered` returns `true`
Description
In a large code base, we have a snippet like this:
object myMutex = new object();
void Foo()
{
if (Monitor.IsEntered(myMutex))
{
if (!Monitor.TryEnter(myMutex))
LogError("Fail"); // #1
}
}
Theortically, line #1 should never be visited, right? But we observe that it actually gets visited sometimes.
Initially, we weren’t checking Monitor.TryEnter inside of Monitor.IsEntered, of course, and this code was added for diagnostic purposes after we’ve noticed deadlocks related to this piece of code: the thread holding the monitor was receiving false from Monitor.TryEnter, despite it should theoretically always succeed and return true.
Reproduction Steps
This was initially reported by the user of our product, and we were able to track it down to the described code.
We are occasionally able to trigger the code to enter this line locally, during a debug session, but so far were unable to prepare an isolated MCVE. We are able to share more code if this helps.
Expected behavior
Monitor.TryEnter should always succeed if the calling thread is holding the lock (i.e. if Monitor.IsEntered returns true).
Actual behavior
Monitor.TryEnter occasionally returns false despite the calling thread holding the lock (i.e. if Monitor.IsEntered returns true).
Regression?
No response
Known Workarounds
Replace a call to Monitor.TryEnter(myMutex) with the following block:
if (Monitor.IsEntered(myMutex))
Monitor.Enter(myMutex);
Surprisingly, this works so far.
Configuration
- .NET version: .NET 6.0.8
- OS: Windows 10
- Architecture: x64
Other information
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 39 (22 by maintainers)
Commits related to this issue
- Fix recurisve `Monitor.TryEnter` to not return false due to a condition A race condition when the object header is transitioning to a sync block can cause a concurrent recursive `TryEnter` to return ... — committed to kouvel/runtime by kouvel 2 years ago
- Fix recurisve `Monitor.TryEnter` to not return false due to a condition (#77243) A race condition when the object header is transitioning to a sync block can cause a concurrent recursive `TryEnter` t... — committed to dotnet/runtime by kouvel 2 years ago
@Mr0N The enters/exits have to be paired. The general concept is known as recursive mutex. If you enter it twice within the same thread then it just increments an internal counter since you know it’s already locked.
As for the actual product bug… .NET / CoreCLR uses an optimization called “thin locks”. Under specific set of circumstances (eg. no previous call to
GetHashCode) there is a fast code path for Monitor.Enter/TryEnter/Exit as long as there is no contention on the lock. It simply updates flags in the object header to record the state of the lock. Once a contention happens though (ie. one thread has already acquired the lock through Monitor.Enter and another thread tries to acquire the same lock through Monitor.Enter) the “thin lock” optimization is abandoned and it transitions into a new state where the lock is tracked through indirect entry in a “sync table”. Under normal circumstances this optimization should be entirely unobservable. Unfortunately it was not…Consider the following scenario:
falseWhy does calling
GetHashCodemake a difference you may ask? Because the object header is used by multiple different optimizations and they are mutually exclusive. It either stores the thin lock, cached hash code, or an indirect pointer to the “sync table” (which can hold a full lock, hash code, and several other things).That’s indeed really weird. I can confirm that the offending thread entered the
if (!Monitor.TryEnter(myMutex))block despite owning the lock:I saw nothing wrong in the disassembly, so that sounds like an issue in
Monitor.TryEnter. There are two other threads competing for the lock so I wonder if that’s part of the story 🤔Do you mean the transition from thin-lock to syncblock? Based on that assumption I managed to repro the issue:
I was not able to reproduce the issue without recreating a new object every time (so very likely an issue during the promotion of the thin lock).
https://github.com/dotnet/runtime/blob/215b39abf947da7a40b0cb137eab4bceb24ad3e3/src/coreclr/System.Private.CoreLib/src/System/Threading/Monitor.CoreCLR.cs#L84 Maybe the problem is that the object is somehow unavailable for 0 nanoseconds, for example, the second thread is reading it
Dump.
This is a very strong statement. I don’t see what is wrong with using the
MonitorAPI as a reentrant lock if it advertises itself as such.Why would
Enterblock or cause a deadlock on the thread that is already holding the lock?No, the code is not async.
So we need to collect a process dump when the process gets stopped on the breakpoint at line #1, is that correct?
@Mr0N Everybody would really appreciate if you stopped spamming this issue. @kevingosse @kouvel Thanks a lot for the quick investigation and the fix!
@ForNeVeR Just a thought: as an alternative workaround, you can call
GetHashCodeon the objects used for locking after allocating them:This will force the promotion to a syncblock as soon as a thread tries locking those objects (instead of waiting for the first moment the lock is contended). In my tests this seems to fix the race condition.
If you manage to capture a memory dump when that happens during a debug session, it could help understanding what’s happening.
Reminds me that issue I faced with frozen segments recently - GC could collect a sync block from a live object and potentially allocate a new syncblock on that place for a different object resulting that live object started pointing to a different sync object. Just in case if it potentially can be a similar issue.
That’s completely irrelevant. You can use Interlocked operations instead if that makes you feel better:
It’s not a problem, it’s just a volatile field
Thanks for the dump, from that it looks like the sync block has transitioned already and based on the state of it at the time of the dump, I didn’t see any way for TryEnter to return false. So likely it’s a race during transition.
@Mr0N
Monitor.IsEnteredis documented to returntrueif the current thread is holding the lock.