runtime: Exception throw/catch impactfully slower when debugger attached
Repro:
using System;
using System.Diagnostics;
while (true)
{
var sw = Stopwatch.StartNew();
for (int i = 0; i < 100; i++)
{
try { throw new Exception(); } catch { }
}
Console.WriteLine(sw.Elapsed);
}
When I run that with ctrl-F5, I get output like:
00:00:00.0004349
00:00:00.0004371
00:00:00.0004347
00:00:00.0004366
00:00:00.0004440
When I run that with F5, I get output like:
00:00:01.9491196
00:00:01.8550998
00:00:01.8392192
00:00:01.8323522
00:00:01.8168992
That’s a 4500x slowdown, with every exception throw/catch consuming ~20ms. And this does scale with exception count: if I increase the repro to 1000 exceptions, without the debugger it takes ~0.004s and with the debugger it takes ~18s.
While ideally well-behaved apps wouldn’t throw lots of exceptions, this has shown to be a significant cause of slowdown for real apps while being developed, with a noticeable impact on developer inner loop performance especially if the exceptions occur at app startup.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 3
- Comments: 19 (19 by maintainers)
I spent a little time with a profiler to assess where we are spending our time. I profiled Steve’s example that uses depth 10 callstacks without async:
On my particular machine (Windows, .Net Core 3.1.11, VS 16.8.4) using Ctrl-F5 is ~1.5ms per iteration of the outer while loop and F5 is ~1.5s, so about a 1000x overhead for me. During the 15ms handling of each exception there are 3 events dispatched from the runtime to the debugger:
This shows the breakdown of the ~15ms window for one exception
Each event logically traverses across 3 components:
My take is there are two different approaches we could follow at this point:
Personally my preference is that if we are going to spend time on this we should look into option (2), the major changes. I worry that we don’t have much runway left to optimize incrementally given how easy it is for apps to start throwing yet more exceptions. I also expect building in-proc extensibility points opens the door to a bunch of other useful diagnostic scenarios. For example in the debugger space it could contribute towards other high performance events like conditional breakpoints, or being able to do more sophisticated analysis of rethrown exceptions for async. Outside debugging another example is having a dump collector that can capture dumps on 1st chance exceptions.
Priority-wise the runtime diagnostics team already has months of committed feature work for .NET 6 and a hearty backlog of bugs. I’d guess this would land in the grey area where we might be able to get it within the release timeframe but it is far enough down that we shouldn’t be making any promises at this point.
Thoughts on all of this? In particular @gregg-miskelly I’d be curious to hear what you think feasibility/interest is for doing a meatier change here?
On my machine, with JMC on:
I’ll take it 😄
Just to confirm you are saying it would be OK if we defined a new exception callback that didn’t provide the ICorDebugFrame argument or we kept the current callback and return null for that parameter? If you use that parameter for anything we probably still have to do a stackwalk in DBI.
It is quite a bit more than I was anticipating we’d able to cut. I’m a little surprised we hadn’t already cut those in the past : ) Rough guess maybe a 70% savings if this all held up? If we still need the stackwalk but we can drop the first chance notification that might be 50% savings.
One nice part is that an API that disables debugger exception events is part of what we’d need to build to make the in-proc filtering case useful anyways.
I don’t have a repro that has as impressive a penalty as 4000x but I do have a real-world case.
It’s a champion scenario for the Azure EventHubs Processor SDK that has multi-threading and network calls.
In total the startup process throws ~180 exceptions (which are in fact 32 distinct exceptions but multiplied by layers of async state machines re-throwing) and with JMC off the startup time is:
Without debugger ~7.5sec With debugger ~20sec
What was surprising to me is how large the impact of the debugging is compared to actual networking. The app does 112 blob requests during startup + various TCP connections to EventHubs service so throwing an exception in an async method costs as much as making multiple real service calls over the network.
I’m happy to provide the app with all connection strings ready or record a profile.
Two thoughts –