roslyn: Improved async method code gen for exceptions
Exceptions have significant overhead and are for exceptional situations. The guidance then is to avoid them in performance-sensitive code. However, there are times when the cost of exceptions does matter, and while developers might be able to recognize such situations after the fact and do work to react to prevent performance-related issues there in the future, it’d be nice if the overhead associated with exceptions could be reduced. There is work happening in the .NET runtime to reduce the cost of exceptions significantly in general, but even with that work there will still be significant overhead, and that overhead is magnified by async. The async model for propagating exceptions involves throwing and catching an exception in every “stack frame” as part of the async call chain, so an exception propagating through a long chain of async methods can have an order (or two) of magnitude aggregate higher cost than the corresponding exception in a synchronous call stack.
This is a proposal to do better. There have been similar discussions over the years in a variety of issues/discussions on csharplang, but I’m hoping we can use this issue as the center of gravity and decide if/what to do about this (which could include deciding once and for all to do nothing).
Proposal
Today, for an async method like the following:
public async Task Method()
{
...
await SomethingAsync();
...
}
the compiler emits a MoveNext method containing scaffolding like the following:
try
{
...
awaiter = SomethingAsync().GetAwaiter();
if (!$awaiter.IsCompleted)
{
<>t__state = 42;
<>u__awaiter1 = awaiter;
<>t__builder.AwaitUnsafeOnCompleted(ref $awaiter, ref this);
return;
Label42:
awaiter = <>u__awaiter1;
<>t__state = -1;
}
awaiter.GetResult();
...
}
catch (Exception exception)
{
<>1__state = -2;
<>t__builder.SetException(exception);
return;
}
<>1__state = -2;
<>t__builder.SetResult();
That awaiter.GetResult()
call throws an exception for a faulted awaitee, with that exception then getting caught by the compiler-generated catch, which stores the exception into the builder associated with this method, completing the operation. The GetResult()
method is defined as part of the await pattern, a parameterless method whose return value is either void or matches the TResult
of the operation being awaited.
Pattern
We update the await pattern to optionally include a method of the form:
TResult GetResult(out Exception? exception);
where the TResult
may be void
if the operation doesn’t return a value. If an awaiter exposes a GetResult
of this form, it should not throw an exception even for failed operations. Instead, if the operation completes successfully, it should return a value as does the existing GetResult
method, with the exception
out argument set to null
. If the operation completes unsuccessfully, it should return default(TResult)
, with the exception
out argument set to a non-null instance of an Exception
.
Codegen
When an await is nested inside of any user-written try
block, nothing changes.
For any await outside of all user-written try
blocks, if the awaiter exposes the new GetResult(out Exception? exception)
overload, the compiler will prefer to use it, e.g.
Exception? e = null;
try
{
...
awaiter.GetResult(out e);
if (e is not null) goto Exceptional;
...
}
catch (Exception exception)
{
<>1__state = -2;
<>t__builder.SetException(exception);
return;
}
<>1__state = -2;
<>t__builder.SetResult();
return;
Exceptional:
ExceptionDispatchInfo.AppendCurrentStackFrame(e);
<>t__builder.SetException(e);
In doing so, we avoid the expensive layer of throw/catch in order to propagate the exception from the awaiter to the method’s builder.
A centralized Exceptional:
section like this would keep the additional code size to a minimum, albeit it would lose some diagnostic information about the location of the error. Alternatively, for more code size, every await could include:
...
awaiter.GetResult(out e);
if (e is not null)
{
ExceptionDispatchInfo.AppendCurrentStackFrame(e);
<>t__builder.SetException(e);
return;
}
...
New Runtime APIs
ExceptionDispatchInfo.AppendCurrentStackFrame. Just passing the exception from GetResult(out Exception? exception)
to builder.SetException
would result in a diagnostic gap, as the exception would no longer contain data about this async method, data that would normally be populated as part of the throw/catch. Instead, we add a new ExceptionDispatchInfo.AppendCurrentStackFrame
that does the minimal work necessary to gather the same data about the current stack frame it would as part of a stack walk, and append relevant data to the exception’s stack trace.
All of the public awaiters in CoreLib are updated to expose a new GetResult
overload:
TaskAwaiter
TaskAwaiter<TResult>
ValueTaskAwaiter
ValueTaskAwaiter<TResult>
ConfiguredTaskAwaitable.ConfiguredTaskAwaiter
ConfiguredTaskAwaitable<TResult>.ConfiguredTaskAwaiter
ConfiguredValueTaskAwaitable.ConfiguredValueTaskAwaiter
ConfiguredValueTaskAwaitable<TResult>.ConfiguredValueTaskAwaiter
And we use default interface method support to add new GetResult
overloads to IValueTaskSource
and IValueTaskSource<TResult>
:
public interface IValueTaskSource
{
...
public virtual void GetResult(out Exception? exception)
{
try
{
GetResult();
exception = null;
}
catch (Exception e)
{
exception = e;
}
}
}
public interface IValueTaskSource<TResult>
{
...
public virtual TResult GetResult(out Exception? exception)
{
try
{
exception = null;
return GetResult();
}
catch (Exception e)
{
exception = e;
return default;
}
}
}
such that these new overloads may be used by the ValueTask
awaiters. dotnet/runtime’s implementations of these interfaces would override GetResult(out Exception)
to avoid the default throw/catch.
Risks
- This adds additional boilerplate code to async methods in support of the exceptional path. That means some increase in binary size to support improving the performance of a scenario that’s supposed to be exceptional and non-hot-path. This could be a reason to not do this feature.
- Similarly, this adds non-pay-for-play overhead to every await (at a minimum a null check / branch, though hopefully reasonably well predicted) in support of the exceptional case but paid on the success path. We would need to measure the incurred overhead to make sure it’s in the noise… anything more than that should make us not do this feature.
- This changes the code gen for async methods, so tools which recognize it (e.g. decompilers) will need to be updated.
- We’d need to measure the proposed
ExceptionDispatchInfo.AppendCurrentStackFrame
method’s overhead to make sure it’s meaningfully-enough better than a throw/catch.
Alternatives
The simplest alternative is for developers to keep on keeping on as they do today. Regardless of whether this feature exists or not, we still want exceptions to be exceptional, and so developers should continue to try to keep exceptions off the hot path. In situations where a developer detects an expensive use of exception throw/catch due to async, they can manually change the code to avoid throwing via a custom await helper, e.g.
internal readonly struct NoThrowAwaiter : ICriticalNotifyCompletion
{
private readonly Task _task;
public NoThrowAwaiter(Task task) => _task = task;
public NoThrowAwaiter GetAwaiter() => this;
public bool IsCompleted => _task.IsCompleted;
public void GetResult() { }
public void OnCompleted(Action continuation) => _task.GetAwaiter().OnCompleted(continuation);
public void UnsafeOnCompleted(Action continuation) => _task.GetAwaiter().UnsafeOnCompleted(continuation);
}
...
await new NoThrowAwaiter(task);
if (task.IsFaulted)
{
Use(task.Exception.InnerException);
}
https://github.com/dotnet/runtime/issues/22144 tracks adding new public API for this in dotnet/runtime and will hopefully be addressed in .NET 8, regardless of this issue. This issue ends up being about helping to reduce the situations in which a developer would need to do that and need to know to do that.
Related discussion: https://github.com/dotnet/csharplang/discussions/4450
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 8
- Comments: 27 (19 by maintainers)
The extra frame this proposal suggests adding to the exception’s trace should be cheap. It’s not a throw, nor is it a stack walk. It could be as simple as appending a const string generated by the compiler, similar to CallerMethodName, not even needing reflection of any kind.
I don’t see a good reason to special-case OperationCanceledException here, nor would I want to complicate code gen for it even more, nor sacrifice debugability.
@tommcdon I’m curious what you think of the diagnostic impact
The benchmark in this issue demonstrates the performance advantage of moving from throwing an exception twice (current behavior) to once (new behavior). The benchmark in the linked issue shows the performance advantage of moving from throwing an exception once (current behavior) to not throwing an exception at all (new behavior). It’s a slightly different scenario. In practice, both proposals have similar performance characteristics (i.e. many code situations will show noticeable benefits, while a subset of those will show exceptional benefits).
Yes. a) That’s a long-term experiment. There’s no immediate plans for shipping anything based on that. b) Even if it does ship, it’ll be in addition to not instead of async/await, which the .NET ecosystem is based on now and will be for a very, very long time. c) As currently defined in that experiment, the lightweight threads aren’t usable everywhere async/await its, e.g. in situations where the actual thread used matters, like UI.
Sure. I took this code:
and decompiled it, then duplicated the resulting C# and modified the Original into a Proposal that tweaks the GetResult calls approximately as suggested. I then wrapped each in a benchmark:
So in each case, we have 6 async methods. In OriginalVersion, all 6 will throw/catch an exception. In ProposedVersion, A, B, C, and D will all propagate the exception in the proposed manner, so only 2 of the 6 will throw/catch an exception. Thus we end up removing ~66% of the exceptions being thrown/caught for this relatively small chain… longer chains would have more savings. This is obviously an approximation, as not all throw/catches have equivalent costs, and I’ve not added anything in here to assist with diagnostics (e.g. the
AppendCurrentStackFrame
method mentioned). On my machine, this benchmark shows the proposal doubling the throughput and cutting allocation by more than half (but, this isn’t decisive… we’d need to measure a more complete prototype, for both success and failure cases, and in more real-world usage… this is just a microbenchmark):Benchmark