runtime: Fix the Scalability problem in System.Threading.TimerQueueTimer

There is a hot lock in System.Threading.TimerQueueTimer that shows up when people use alot of timeouts or Task.Delays. Below is a simple program that causes 10000 Tasks to be each delaying 5 seconds in 100 msec. The program keeps 8 cores at 60% CPU and over half of that is in contention on the one lock in System.Threading.TimerQueueTimer (in the Change, Close and Fire methods)

Name

module coreclr <<coreclr!JIT_MonEnterWorker_InlineGetThread_GetThread_PatchLabel>>

  • system.private.corelib!TimerQueueTimer.Change
  • system.private.corelib!System.Threading.TimerQueueTimer.Close()
  • system.private.corelib!System.Threading.TimerQueueTimer.Fire()

Thus this lock can get hot (We have seen this in a variety of scenarios, mostly in cases where we have many CPUs (e.g 32 or 64), and thus have many outstanding delays.

Note that we don’t actually have problem with the data structure (It is O(n) where every time a timer fires (but not when it is canceled)). The problem is simply the scalability (the fact that we have one lock).

using System;
using System.Threading.Tasks;

namespace TimerScalability
{
    class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Starting, running 10000 tasks that each delay 5 second 100 msec at a time.");
            Task[] tasks = new Task[10000];
            for(int i = 0; i < tasks.Length; i++)
                tasks[i] = Task.Factory.StartNew(TaskBody);

            Console.WriteLine("Waiting");
            Task.WhenAll(tasks).Wait();
            Console.WriteLine("Done");
        }


        static async Task TaskBody()
        {
            // Console.WriteLine("Task {0} Starting", Task.CurrentId);
            for (int i = 0; i < 50; i++)
                await Task.Delay(100);
            // Console.WriteLine("Task {0} Done", Task.CurrentId);
        }
    }
}

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 4
  • Comments: 16 (10 by maintainers)

Most upvoted comments

I’ve solved the problem for myself, but this remains an outstanding issue with the .NET framework.

I have a screenshot of an application dump that I performed yesterday…

Details about the situation:

  • .NET 4.7.2, C# 7.3, VS 2019
  • About 2000-2500 tasks launched at various times via Task.Run() that are essentially infinitely-lived.
  • Most are launched with Task.Factory.StartNew([func], TaskCreationOptions.LongRunning);
  • All are essentially independent, not synchronously firing.
  • Most are constantly issuing short await Task.Delay(10) statements between doing a few microseconds of work.
  • Occasionally, every few minutes… sometimes every 10 minutes the application which normally uses 10% CPU on a 32-core machine, jumps to 100% CPU usage, and maintains 100% usage for 60-100 seconds. <- THIS IS WHAT I HAD TO FIX.

I’ve solved this by writing a custom scheduler for repeated tasks, rather than trusting the .net scheduler anymore. If anyone wants that code, let me know and I’ll see what I can do about open-sourcing it.

Last night, I grabbed a application-dump during one of the incidents before restarting the software… TimerQueueTimer