runtime: Add ValueTask returning async APIs to ADO.NET

With https://github.com/dotnet/corefx/issues/27445 allowing elimination of allocations in asynchronously-executing async methods, we should look into adding ValueTask-returning counterparts to ADO.NET. This would potentially allow zero-allocation database access.

Here are the current async methods:

Task DbConnection.OpenAsync(...);

Task<DbDataReader> DbCommand.ExecuteReaderAsync(...);
Task<DbDataReader> DbCommand.ExecuteDbDataReaderAsync(...);
Task<object> DbCommand.ExecuteScalarAsync(...);
Task DbCommand.ExecuteNonQueryAsync(...);

Task<bool> DbDataReader.ReadAsync(...);
Task<T> DbDataReader.GetFieldValueAsync<T>(...);
Task<bool> DbDataReader.IsDBNullAsync(...);

Notes:

  1. Naming the new methods is going to be complicated. Some parts of corefx are lucky in that they’re introducing ValueTask<T> overloads along with Span<T> (e.g. Stream), but ADO.NET has no such upcoming parameter changes/additions. Since there can’t be yet another overload that only differs on the return type, and the “standard” name with just the Async suffix is taken, we need to come up with a new naming pattern.

  2. There are some missing async methods in ADO.NET in general (e.g. DbCommand.Prepare(), DbConnection.BeginTransactionAsync(...), DbTransaction.CommitAsync(...)…). These can be added directly with return value ValueTask<T> (covered by issue https://github.com/dotnet/corefx/issues/35012). Naming of these new methods should probably be decided in the context of whatever new naming pattern we use for the ValueTask<T> returning variants of the existing async methods.

  3. Also note that async APIs that return Task or Task<bool> (such as ReadAsync() and IsDBNullAsync()) can be implemented to return cached Task<T> instances when they complete synchronously to gain efficiency instead of adopting ValueTask<TResult>. This mitigates the need for ValueTask<T> returning variants for most APIs. Also, in terms of allocations, the advantage of ValueTask<T> is much greater the more fine grained the async methods are.

  4. Compatibility for providers: It seems desirable to offer a default implementation of the Task-based method that calls AsTask() on the ValueTask<TResult> instance returned by the new API, so that new providers don’t need to implement the old Task<TResult> version of the API but instead only implement the more efficient ValueTask<TResult> version. It also seems good not to force existing providers to switch.


**Updated by @divega on 1/13/2019 to consolidate with https://github.com/dotnet/corefx/issues/15011.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 20
  • Comments: 46 (29 by maintainers)

Most upvoted comments

ValueTask<T> DblDataReader.GetFieldValueAsync<T>(…); ValueTask<bool> DbDataReader.IsDBNullAsync(…);

If we’re going to touch this, we might as well fix the IsDBNull problem at the same time. Currently we need to make two calls, once to check for nulls and again to actually read the value. When I profile my ORM, IsDBNull is often flagged as most expensive operation in aggregate. (EDIT: this is no longer true.)

We can probably make it a lot more convenient efficient with methods such as

ValueTask<T?> DblDataReader.GetFieldValueOrNullAsync<T>(…);

Absolutely, it would be great if ADO.NET got a bit of a face-lift in terms of API usability, regardless of any perf aspects…

Good to know that perf-wise things are better, there have been a lot of changes in recent two years.

You’re right. When I last benchmarked it on .NET Framework a year or two ago it was a major issue. Now it’s barely visible in the performance profiler.

I was thinking about this recently and had an idea. There are three main scenarios.

  1. The caller want to know if something is null but then acts directly on that knowledge. I’s say this is rare and is covered by IsDBNull[Async]
  2. The caller knows by value of the query definition that the field will not be null and null appearing is an exceptional condition, this is covered by Get* and GetFieldValue[Async]
  3. The caller wants to know if something is null and if it isn’t then get that value. At the moment you do IsDBNull followed immediately by GetFieldValue. This is the interesting case.

Even when what you want is buffered locally you’re calling two Task<T> returning methods and unless T is bool you’re really going to have to allocate most of the time. Even if we could change this to ValueTask<T> it would still be two calls to ask about the same field. The usual sync pattern for this sort of thing would be bool TryGetNotNullFieldValue<T>((string fieldname, out T value) which if ok but that out parameter prevents it being converted to async. But! we have ValueTuples now so we don’t have to use the awkward pattern of having additional return parameters as out/ref. We can write ValueTask(bool isNotNull, T value)> GetNotNullFieldValue(Async<T>(string fieldname) which is neatly async valuetasked and asks both relevant questions in a single call.

You’d end up with callsites like this:

private static async Task Performance(string connectionStirng)
{
    using (var connection = new SqlConnection(connectionStirng))
    using (var command = new SqlCommand("select id from table",connection))
    using (var reader = await command.ExecuteReaderAsync())
    {
        while (await reader.ReadAsync())
        {
            var valueTask = reader.GetNotNullFieldValueAsync<int>("id");
            var result = valueTask.IsCompletedSuccessfully ? valueTask.Result : await valueTask.ConfigureAwait(false);
            if (result.isNotNull)
            {
                // use the value
            }
            else
            {
                // er...
            }
        }
    }
}

And if you didn’t care about the valuetask pattern you can just directly await the method and skip the task variable entirely. This way T should be annotatable with the nullability attributes so you don’t have to escape when accessing result.Value. If you really want to go all-out it could return a struct type with bool operator overload so you can just do if (result) { // use result.Value Of course the function name sucks but that’s because all the really good names are already taken, it’d have to be something silly and overlong but we can pass that one up to the design review meeting to handle, they love spending hours trying to decide the best names for things 😁

I added api-ready-for-review on dotnet/corefx#35012. Let’s at least discuss the ValueTask<bool> version of ReadAsync there.

If this proposed deserialization API is implemented then the ADO.NET APIs will not be very chatty. There will be less of a need for ValueTask. ValueTask can be used internally.

Depending on the ADO.NET provider implementation, calling IsDBNull() separately from GetFieldValue<T>() isn’t necessary a big perf problem (I’m not sure it’s the case with Npgsql). But introducing something like GetFieldValueOrNull() (and its async counterpart) would be a good idea regardless, if only for improving the API/usability.

Note that using the async versions of these methods is probably a bad idea unless CommandBehavior.SequentialAccess is specified, since the entire row is pretty much guaranteed to be buffered in memory anyway…

You’re right that when OpenAsync() returns asynchronously (i.e. a physical open or wait), the memory overhead is probably negligible. For synchronous invocations (no I/O, pooled idle connection is returned) there is no memory overhead, since OpenAsync() returns a non-generic task which can be cached. So I agree this has more value for DbDataReadet.Read().

However, don’t forget that ADO.NET is sometimes used to access databases like Sqlite, where the physical open can be much much faster (no networking), so it could be significant there.