runtime: Fix the Scalability problem in System.Threading.TimerQueueTimer
There is a hot lock in System.Threading.TimerQueueTimer that shows up when people use alot of timeouts or Task.Delays. Below is a simple program that causes 10000 Tasks to be each delaying 5 seconds in 100 msec. The program keeps 8 cores at 60% CPU and over half of that is in contention on the one lock in System.Threading.TimerQueueTimer (in the Change, Close and Fire methods)
Name
module coreclr <<coreclr!JIT_MonEnterWorker_InlineGetThread_GetThread_PatchLabel>>
- system.private.corelib!TimerQueueTimer.Change
- system.private.corelib!System.Threading.TimerQueueTimer.Close()
- system.private.corelib!System.Threading.TimerQueueTimer.Fire()
Thus this lock can get hot (We have seen this in a variety of scenarios, mostly in cases where we have many CPUs (e.g 32 or 64), and thus have many outstanding delays.
Note that we don’t actually have problem with the data structure (It is O(n) where every time a timer fires (but not when it is canceled)). The problem is simply the scalability (the fact that we have one lock).
using System;
using System.Threading.Tasks;
namespace TimerScalability
{
class Program
{
static void Main(string[] args)
{
Console.WriteLine("Starting, running 10000 tasks that each delay 5 second 100 msec at a time.");
Task[] tasks = new Task[10000];
for(int i = 0; i < tasks.Length; i++)
tasks[i] = Task.Factory.StartNew(TaskBody);
Console.WriteLine("Waiting");
Task.WhenAll(tasks).Wait();
Console.WriteLine("Done");
}
static async Task TaskBody()
{
// Console.WriteLine("Task {0} Starting", Task.CurrentId);
for (int i = 0; i < 50; i++)
await Task.Delay(100);
// Console.WriteLine("Task {0} Done", Task.CurrentId);
}
}
}
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 4
- Comments: 16 (10 by maintainers)
I’ve solved the problem for myself, but this remains an outstanding issue with the .NET framework.
I have a screenshot of an application dump that I performed yesterday…
Details about the situation:
I’ve solved this by writing a custom scheduler for repeated tasks, rather than trusting the .net scheduler anymore. If anyone wants that code, let me know and I’ll see what I can do about open-sourcing it.
Last night, I grabbed a application-dump during one of the incidents before restarting the software…