runtime: OutOfMemoryException at Monitor.ReliableEnterTimeout with plenty free memory available
Description
A process running under FW 3.1 threw OutOfMemoryException (OOM), then stopped by invoking FailFast
There are 2 threads:
- First thread catches OOM from Monitor.ReliableEnterTimeout and places that error into a
Application.Logging.Logger
instance queue for processing. The call stack there is what is mentioned in the issue description:
at System.Threading.Monitor.ReliableEnterTimeout(Object obj, Int32 timeout, Boolean& lockTaken)
at Application.Profiling.SessionTimer.Session.DurationInstance.TryGetExpiredActionDurationInstance(ActionDurationInstance& instance)
at Application.Profiling.SessionTimer.Session.ProcessExpiredDurationInstances()
at Application.Profiling.SessionTimer.Session.RunFireElapsedEvent(Object state)
- The other thread reads the queued exception from the queue and calls FailFast with the exception information because of of OOM is assumed to be a non-recoverable, fatal state for the application:
at System.Environment.FailFast(System.String)
at Application.Logging.Logger.ExceptionLoggerAsync(System.Exception)
at Application.Logging.LogProcessor`1+<<RunAsync>b__15_0>d
at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object)
at System.Runtime.CompilerServices.AsyncTaskMethodBuilder`1+AsyncStateMachineBox`1 at System.Threading.Tasks.AwaitTaskContinuation.RunOrScheduleAction(System.Runtime.CompilerServices.IAsyncStateMachineBox, Boolean)
at System.Threading.Tasks.Task.RunContinuations(System.Object)
at System.Threading.Tasks.Task.TrySetResult()
at System.Threading.Tasks.Task+DelayPromise.CompleteTimedOut()
The OOM came out of blue as the software was running on a system with 200GB RAM still available.
The memory graph with the OOM moment:
To monitor the system performance we are running “continuous” PerfView on the box all the time as
C:\PerfView\PerfView.exe collect -CollectMultiple=1000000 -MaxCollectSec=10 -AcceptEULA -NoView -NoGui -CircularMB=1024 -BufferSize=1024 -CPUSampleMSec:10 -ClrEvents=JITSymbols+GC+GCHeapSurvivalAndMovement+Stack -KernelEvents=process+thread+ImageLoad+Profile+ThreadTime -DotNetAllocSampled -NoNGenRundown -NoV2Rundown -LogFile:log.txt
The collected PerfView set most closely located to the moment of the crash showed about 1% of the CPU activity in Monitor.Enter/TryEnter likely seen as MonReliableEnter_Portable:
In turn, SyncBlockCache::GetNextFreeSyncBlock came along (3.1 SyncBlock.cpp, 5.0 SyncBlock.cpp):
There are a few possibilities for an OM there, for instance in SyncBlockCache::GetNextFreeSyncBlock
SyncBlockArray* newsyncblocks = new(SyncBlockArray);
if (!newsyncblocks)
COMPlusThrowOM ();
Configuration
- Windows Server 2016
- .Net Core 3.1.12
- CoreCLR Version: 4.700.21.6504
- 64 vCPUs VM in GCE
- 600 GB RAM
- Auto-dump disabled to speedup server restart during crash (considering the time it takes to collect the 400GB large process dump)
Questions
- Where else could OOM be triggered on this particular call path if not in syncblock.cpp?
- Under what conditions the OOM may happen on a system with that much free RAM available?
- Could this somehow related to the concurrent Perfview activity?
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 43 (21 by maintainers)
Commits related to this issue
- Use custom error message when running out of syncblocks Contributes to #49215 — committed to jkotas/runtime by jkotas 3 years ago
- Use custom error message when running out of syncblocks (#60013) Contributes to #49215 — committed to dotnet/runtime by jkotas 3 years ago
- Use custom error message when running out of syncblocks Contributes to #49215 — committed to dotnet/runtime by jkotas 3 years ago
- [release/6.0] Use custom error message when running out of syncblocks (#60592) * Use custom error message when running out of syncblocks Contributes to #49215 * Update src/coreclr/dlls/mscorrc/... — committed to dotnet/runtime by github-actions[bot] 3 years ago
- Use custom error message when running out of syncblocks (#60013) Contributes to #49215 — committed to kronic/runtime by jkotas 3 years ago
I do not think I can get it to .NET 5 (it is going out of support soon). I will try to get it to .NET 6.