runtime: Add ValueTask returning async APIs to ADO.NET
With https://github.com/dotnet/corefx/issues/27445 allowing elimination of allocations in asynchronously-executing async methods, we should look into adding ValueTask-returning counterparts to ADO.NET. This would potentially allow zero-allocation database access.
Here are the current async methods:
Task DbConnection.OpenAsync(...);
Task<DbDataReader> DbCommand.ExecuteReaderAsync(...);
Task<DbDataReader> DbCommand.ExecuteDbDataReaderAsync(...);
Task<object> DbCommand.ExecuteScalarAsync(...);
Task DbCommand.ExecuteNonQueryAsync(...);
Task<bool> DbDataReader.ReadAsync(...);
Task<T> DbDataReader.GetFieldValueAsync<T>(...);
Task<bool> DbDataReader.IsDBNullAsync(...);
Notes:
-
Naming the new methods is going to be complicated. Some parts of corefx are lucky in that they’re introducing
ValueTask<T>
overloads along withSpan<T>
(e.g. Stream), but ADO.NET has no such upcoming parameter changes/additions. Since there can’t be yet another overload that only differs on the return type, and the “standard” name with just theAsync
suffix is taken, we need to come up with a new naming pattern. -
There are some missing async methods in ADO.NET in general (e.g.
DbCommand.Prepare()
,DbConnection.BeginTransactionAsync(...)
,DbTransaction.CommitAsync(...)
…). These can be added directly with return valueValueTask<T>
(covered by issue https://github.com/dotnet/corefx/issues/35012). Naming of these new methods should probably be decided in the context of whatever new naming pattern we use for theValueTask<T>
returning variants of the existing async methods. -
Also note that async APIs that return
Task
orTask<bool>
(such asReadAsync()
andIsDBNullAsync()
) can be implemented to return cachedTask<T>
instances when they complete synchronously to gain efficiency instead of adoptingValueTask<TResult>
. This mitigates the need forValueTask<T>
returning variants for most APIs. Also, in terms of allocations, the advantage ofValueTask<T>
is much greater the more fine grained the async methods are. -
Compatibility for providers: It seems desirable to offer a default implementation of the Task-based method that calls
AsTask()
on theValueTask<TResult>
instance returned by the new API, so that new providers don’t need to implement the oldTask<TResult>
version of the API but instead only implement the more efficientValueTask<TResult>
version. It also seems good not to force existing providers to switch.
**Updated by @divega on 1/13/2019 to consolidate with https://github.com/dotnet/corefx/issues/15011.
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 20
- Comments: 46 (29 by maintainers)
ValueTask<T> DblDataReader.GetFieldValueAsync<T>(…); ValueTask<bool> DbDataReader.IsDBNullAsync(…);
If we’re going to touch this, we might as well fix the IsDBNull problem at the same time. Currently we need to make two calls, once to check for nulls and again to actually read the value.
When I profile my ORM, IsDBNull is often flagged as most expensive operation in aggregate.(EDIT: this is no longer true.)We can probably make it a lot more convenient
efficientwith methods such asValueTask<T?> DblDataReader.GetFieldValueOrNullAsync<T>(…);
Absolutely, it would be great if ADO.NET got a bit of a face-lift in terms of API usability, regardless of any perf aspects…
Good to know that perf-wise things are better, there have been a lot of changes in recent two years.
You’re right. When I last benchmarked it on .NET Framework a year or two ago it was a major issue. Now it’s barely visible in the performance profiler.
I was thinking about this recently and had an idea. There are three main scenarios.
IsDBNull[Async]
Get*
andGetFieldValue[Async]
IsDBNull
followed immediately byGetFieldValue
. This is the interesting case.Even when what you want is buffered locally you’re calling two
Task<T>
returning methods and unless T is bool you’re really going to have to allocate most of the time. Even if we could change this toValueTask<T>
it would still be two calls to ask about the same field. The usual sync pattern for this sort of thing would bebool TryGetNotNullFieldValue<T>((string fieldname, out T value)
which if ok but that out parameter prevents it being converted to async. But! we haveValueTuple
s now so we don’t have to use the awkward pattern of having additional return parameters as out/ref. We can writeValueTask(bool isNotNull, T value)> GetNotNullFieldValue(Async<T>(string fieldname)
which is neatly async valuetasked and asks both relevant questions in a single call.You’d end up with callsites like this:
And if you didn’t care about the valuetask pattern you can just directly await the method and skip the task variable entirely. This way T should be annotatable with the nullability attributes so you don’t have to escape when accessing
result.Value
. If you really want to go all-out it could return a struct type with bool operator overload so you can just doif (result) { // use result.Value
Of course the function name sucks but that’s because all the really good names are already taken, it’d have to be something silly and overlong but we can pass that one up to the design review meeting to handle, they love spending hours trying to decide the best names for things 😁I added api-ready-for-review on dotnet/corefx#35012. Let’s at least discuss the ValueTask<bool> version of ReadAsync there.
If this proposed deserialization API is implemented then the ADO.NET APIs will not be very chatty. There will be less of a need for
ValueTask
.ValueTask
can be used internally.Depending on the ADO.NET provider implementation, calling
IsDBNull()
separately fromGetFieldValue<T>()
isn’t necessary a big perf problem (I’m not sure it’s the case with Npgsql). But introducing something likeGetFieldValueOrNull()
(and its async counterpart) would be a good idea regardless, if only for improving the API/usability.Note that using the async versions of these methods is probably a bad idea unless
CommandBehavior.SequentialAccess
is specified, since the entire row is pretty much guaranteed to be buffered in memory anyway…You’re right that when
OpenAsync()
returns asynchronously (i.e. a physical open or wait), the memory overhead is probably negligible. For synchronous invocations (no I/O, pooled idle connection is returned) there is no memory overhead, sinceOpenAsync()
returns a non-generic task which can be cached. So I agree this has more value forDbDataReadet.Read()
.However, don’t forget that ADO.NET is sometimes used to access databases like Sqlite, where the physical open can be much much faster (no networking), so it could be significant there.