SqlClient: Reading large data (binary, text) asynchronously is extremely slow

Reading a 10MB VARBINARY(MAX) value asynchronously takes around 5 seconds my machine, while doing the same synchronously takes around 20ms. Increasing the data size to 20MB increases the running time to around 52 seconds.

These numbers were with Microsoft.Data.SqlClient 1.1.3, but 2.0.0-preview4.20142.4 has the same issue (it actually seems slightly slower). Note that I’m not posting final BDN results because the benchmark ran extremely slowly.

Note that this looks very similar to #245, but that issue was fixed for 1.1.0. The difference may be that here we’re writing binary data - or possibly the bigger size.

Benchmark code
[MemoryDiagnoser]
public class Benchmarks
{
    const string ConnectionString = "Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0";


    [GlobalSetup]
    public void Setup()
    {
        using var conn = new SqlConnection(ConnectionString);
        conn.Open();

        using var cmd = conn.CreateCommand();
        cmd.CommandText = @"
IF OBJECT_ID('dbo.data', 'U') IS NOT NULL
DROP TABLE data; 
CREATE TABLE data (id INT, foo VARBINARY(MAX))
";
        cmd.ExecuteNonQuery();

        cmd.CommandText = "INSERT INTO data (id, foo) VALUES (@id, @foo)";
        cmd.Parameters.AddWithValue("id", 1);
        cmd.Parameters.AddWithValue("foo", new byte[1024 * 1024 * 10]);
        cmd.ExecuteNonQuery();
    }

    [Benchmark]
    public async ValueTask<int> Async()
    {
        using var conn = new SqlConnection(ConnectionString);
        using var cmd = new SqlCommand("SELECT foo FROM data", conn);
        await conn.OpenAsync();

        return ((byte[])await cmd.ExecuteScalarAsync()).Length;
    }

    [Benchmark]
    public async ValueTask<int> Sync()
    {
        using var conn = new SqlConnection(ConnectionString);
        using var cmd = new SqlCommand("SELECT foo FROM data", conn);
        conn.Open();

        return ((byte[])cmd.ExecuteScalar()).Length;
    }

    static void Main(string[] args)
        => BenchmarkRunner.Run<Benchmarks>();
}

Originally raised in https://github.com/dotnet/efcore/issues/21147

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 91
  • Comments: 144 (97 by maintainers)

Commits related to this issue

Most upvoted comments

I think this problem should have a lot bigger priority as far as “USE ASYNC ON IO EVERYWHERE” is strongly suggested by Microsoft. And … this issue is alive for 1.5 years?

It is not “10% performance penalization”. This problem caused our code has taken 280x slower with Async methods than sync methods and took 3GB of memory instead of 200MB…

😕

The work to merge the codebases is in progress. If you check the PR’s on this repo then you’ll see several about merging or moving to shared. It’s a complex task that has to be done carefully to avoid introducing new bugs.

I respectfully disagree. Modernizing to async/await and away from the current callback-based async code - with its different (and sometimes duplicate) codepaths is IMHO vital for the maintainability and reliability of this foundational library.

I’ve had a look into the profiles and I was wrong. The optimization from https://github.com/dotnet/SqlClient/issues/245 is being used because it’s reading as PLP data. The excessive memory usage is caused mostly by input packet buffers on the TDS side being lost into the snapshot packet chain. The time escalation is cause by the snapshot chain repeatedly being replayed.

The only currently available mitigations are to increase the packet size to as close to 16K as you can get or to use a stream reader mode which should work on a per-packet basis and not buffer the entire contents (I’m a bit hazy on that for async, it probably needs checking).

The long term fix would be to rewrite async reads to avoid replaying the entire packet chain on every receipt. This is akin to brain surgery. I’ve looked into this while reviewing other performance issues around async and it isn’t something to attempt lightly and if I managed it solo and dropped a PR for it the team they may all justifiably quit in protest.

The

TdsParser and TdsParserStateObject, you know, the really complicated ones.

@David-Engel I want to help with this issue too, so I started merging TdsParserStateObject. But now I am blocked waiting for more feedback from you guys. It has been about 5 months.

You should really prioritize TdsParserStateObject. There is this small, relatively easy optimization, which can be done inside it and eliminates about 90% of the overhead (note: the linked version may not handle cancellation properly in all edge cases, it’s a proof of concept). The async read operations will still have quadratic complexity instead of linear, but with a much better constant factor. There is a more detailed explanation about it in my previous comments.

I was hoping to make a PR with it as soon as the file has been merged. But now I am wondering whether you would prefer a PR for the netcore version only.

Ok. Based on that range I’ve run some benches with changes I have in the pipeline and there’s some good news.

The table lists the sizes from 1 to 20 meg in interesting increments. The data points for sync show the synchronous perf numbers so you can see that we’re still a long way off parity. the async-cm31 branch number is the current working PR https://github.com/dotnet/SqlClient/pull/2157 I have open and the async-cm32 numbers are from the PR that follows it and contains changes to how the snapshot packet list is organised and iterated.

size sync async-cm31 async-cm32
1 5.692 10.382 9.228
5 19.78 293.99 96.88
10 39.5 2,297.18 495.26
15 53.92 9,409.01 1,401.61
20 80.46 28,579.30 2,867.15

image

As you can see the current async-cm31 line scales horribly. The newer async-cm32 line scales better but is still not linear, The larger your payload the more benefit you’ll get from cm32. I can still get this to parity in speed terms but that requires a lot more internal merging to be able to implement the continuation point state capture/restore.

The code is all available, feel free to stop waiting and start helping.

The scaling is much better. The absolute values are still not equal and it requires a lot more merge work before I can implement the changes that allow them to have the same speed. We’re getting there slowly.

Yup. The lack of communication from the people setting the goals frustrates me too. I don’t know who they are, how to get their attention or how to make a case that they should pay attention. The source may be open but the project management definitely is not.

@Wraith2 I/We do pay attention. Feel free to @ me or Cheena if you think there are issues that are being missed. Async perf is definitely high on our priority list and I hope we can budget dedicated time for it next semester. Like you said, we aren’t a huge team and we have to pick and choose what we work on, including features important to strategic MS goals.

I hear your complaint about open planning. Right now, most of our planning is done internally and we definitely take GitHub issues into account. This one is currently 14th in our list of top GH issues. But the point about considering the number of references in the weight of an issue is interesting. I’ll look into adjusting our reporting to consider that. Perhaps we can also do better at labeling issues and PRs we are working on and targeting them to milestones so people better know what is coming in future releases.

I’ve opened an experimental PR at https://github.com/dotnet/SqlClient/pull/1229 which has a rewritten async replay mechanism. It also has some complex packet handling at the core of the driver rewritten to be able to cope with the new speed requirements. I’ve tested this locally and it succeeds in sync and async read speed/reliability tests where previous versions were extremely slow or outright failed. It also passes all the CI tests apart from async cancellation (which is a shitshow anyway).

I’d like to ask anyone here who is interested to see if you can test it on your dev workloads and see if the speed increase is noticeable and if you can see any bugs or reliability problems. If you can then please pick up the artefacts from the CI on the linked PR and give it a try. Do not try to use this in production. Do not put this near things that will cost you something of value to replace. This is contribution dev experiment and not supported in any way.

It’s not memory optimized so I expect the GC stats to be pretty poor (inline with the earlier dev results posted above) but if it actually works then more time and be put into optimizations. Large async reads should be faster by default and can be made almost as fast as sync by setting the AppContext switch Switch.Microsoft.Data.SqlClient.UseExperimentalAsyncContinue to true, though this will burn gc time.

With the fix in https://github.com/dotnet/SqlClient/pull/603 the numbers are better:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Async 2,096.68 ms 14.390 ms 13.460 ms 2000.0000 1000.0000 1000.0000 30.5 MB
StreamAsync 25.42 ms 0.499 ms 0.534 ms 2031.2500 1968.7500 1968.7500 32.36 MB
Sync 16.67 ms 0.041 ms 0.036 ms 406.2500 406.2500 406.2500 20 MB

How is this not a top priority for the team to fix?

I can only assume that you’re complaining to the wrong people. No-one here has the ability to decide the priorities for the SQL server team. If you want something changed you’ll need to find a way to reach them. Try reaching out to whoever you bought your server licenses from.

I’m here, I’m trying to fix it. It’s not going quickly.

Note that the Pr i referenced above which vastly reduces the scaling factor has been merged so the next release will be improved.

Wasn’t worried about the competence of the devs, was only concerned that priorities might have changed away from fixing the underlying issue and towards guiding internal consumers to not use async sqlclient long term. Reading the PR discussions it wasn’t obvious that the change was intended to be a temporary workaround.

I decided to investigate what I could do about strings in a proof of concept when I had some holiday. I tried adding support for continue rather than replay. In doing so I made some changes to the machinery used which had a reasonable effect all on their own.

To start with the baseline is the benchmark from the original post slightly trimmed for debugging speed to 5Mib strings.

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
AsyncRetry 290.86 ms 5.118 ms 4.537 ms 500.0000 - - 20.34 MB
Sync 21.11 ms 0.044 ms 0.041 ms 343.7500 343.7500 343.7500 15 MB

Then reworked async retry alone:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
AsyncRetry 97.61 ms 1.407 ms 1.316 ms 833.3333 500.0000 166.6667 20.34 MB
Sync 20.83 ms 0.408 ms 0.419 ms 343.7500 343.7500 343.7500 15 MB

Then reworked async with continue:

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
AsyncRetry 93.91 ms 1.028 ms 0.911 ms 666.6667 333.3333 - 21.33 MB
AsyncContinue 22.07 ms 0.124 ms 0.110 ms 1062.5000 812.5000 343.7500 21.33 MB
Sync NA NA NA - - - -

Note that sync didn’t finish due to a stability problem I’m still trying to track down.

Interesting numbers I thought. The memory and GC numbers are pretty appalling but memory is a reusable resource where time is linear and limited so I suppose in this scenario wasting memory to achieve speed is worth doing.

All results are netcore only. As previously stated I’m not doing this work in two difference codebases so until TdsParserStateObject has been merged between the two versions of this library none of this performance will be available. If you want that to happen please pass that feedback along to whoever is in charge of goals for this library so they can prioritize appropriately.

Unless I’ve got something very wrong it won’t work because there’s a bug.

            using var conn = new SqlConnection(ConnectionString);
            using var cmd = new SqlCommand("SELECT foo FROM data", conn);
            await conn.OpenAsync();
            using var reader = await cmd.ExecuteReaderAsync(
                System.Data.CommandBehavior.SequentialAccess
            );
            await reader.ReadAsync();
            using var stream =  reader.GetStream(0);
            using var memory = new MemoryStream(16 * 1024);

            await stream.CopyToAsync(memory);
            return (int)memory.Length;

This should work but freezes in a task wait after a single read cycle, the second read never completed. IF you change it to standard mode it’ll work but it does so by fetching the entire field and giving you a reader over the byte[]. So no workaround, this needs fixing if the team agree that it’s a bug.

It will be released from this repo as the usual independent package. It isn’t tied to net8.

There is no need for any server side change. My point was that clearly asking about it here doesn’t change anything. Someone somewhere that isn’t here is setting priorities, try and find them.

really? how does asp.net know sqlclient is doing sync under the covers?

No idea, but it’s toggle-able https://learn.microsoft.com/en-us/aspnet/core/fundamentals/servers/kestrel/options?view=aspnetcore-7.0#synchronous-io

Pretty sure that’s only for web requests/responses (that is, you can’t synchronously read/write from/to stream), everything outside that should be good.

Our efforts are focused on improving performance in tools and applications we build to mitigate this issue. We’re certainly committed in getting this issue addressed soon as well.

Do you have any idea when this merging might happen? Or is there a way to introduce an adapter for the .net framework side?

Is there a link to another issue where we might vote, help, or increase awareness of this issue? It seems to me most people don’t even notice the impact of this issue, or the issue get’s reported to other projects like EntityFramework, and not get the priority it actually deserves.

Is there any timeframe when this will be fixed? Or when the blocking issue (mentioned by @Wraith2 and @panoskj ) will be fixed? It has been more than half a year now since the last comments here 😦

So should I use Microsoft.Data.SqlClient instead?

Yes. Generally it will always be better than the System versions in terms of performance. In this specifiic case it isn’t going to be noticeably better because as you can see this issue is still open and there’s still a problem.

Still not acceptable.

This irks me. It’s a managers disapproval expression but does nothing constructive. I’ve investigated and identified the problem but that leads to another problem. There are blocking factors I can’t control or mitigate.

The System.Data.SqlClient versions aren’t going to improve. Only the Microsoft.Data.SqlClient version in this repo is. The target SQL Server version makes no difference because the problem is in the client library. The framework version makes almost no difference, netcore and later will be slightly faster but not usefully in relation to the query times we’re seeing.

Maxing your packet size to lower the number of packets is the only useful mitigation at the moment.

I can make it faster as long as you don’t mind it crashing 🤷‍♂️

I recently updated the EF Core SQL Server provider docs to make that more visible: https://github.com/dotnet/EntityFramework.Docs/pull/3197/files

I agree with @roji , the current asynchronous implementation is fundamentally wrong and this is why the performance is so bad. It is essentially wrapping a synchronous parsing implementation, repeating it each time a packet is received until it runs successfully.

Instead of reinventing a state machine to deal with the problem of stopping and resuming the execution of the parsing algorithm between packets, the natural C# solution is to use async/await where the end-result will be more maintainable and most likely more performant as well.

On the other hand this refactoring is not trivial and will take a lot of time.

I would propose a middle-ground modular solution: use async/await where it is easy to do so. I made such a proof-of-concept and my notes were:

  1. Some parts can be trivially modernized to use async/await.
  2. Some parts will take unreasonably long time to refactor but can be worked-around like @Wraith2 suggests.
  3. Using async/await where possible reduces allocations and makes the code shorter.
  4. The end result has the same performance as the sync version, with potential to be refactored into running faster than the sync version.

But then there is also the problem that there are two codebases, one for .net framework and another for .net core. That’s why I am trying to merge the files required for optimizing the parsing algorithms: #2254, #2168, #1844, #1520 but it is taking years. We cannot simply make such drastic changes only for .net core (as the codebases will become significantly different, making it very hard to merge them in the future), but making the changes for both codebases amounts to a lot more work and PRs to be reviewed/merged and as a result more time (it is more efficient to merge before refactoring).

It is unclear to me whether the owners of the repository would like to prioritize merging the parser logic source code or whether they would accept PRs fixing this issue for the .net core version only, @David-Engel, @DavoudEshtehari and @JRahnama what are your thoughts on this? For the record, I am willing to help you merge the required files as fast as you can review/accept/merge them, as you have seen.

I have confirmed that the major improvement will be in 5.2 preview 4 and later

Both are working around the current issue and can be changed to use async when it’s sensible to do so. @cheenamalhotra knows what she’s doing.

I am happy there are still people working on this 😄

I understand that it is difficult, but I accidentally have already run into this problem using EF core many times, just add a memo field to any entity and you will have to dig through all code that accesses this, change it from async to sync. Somehow I consider this being one of the biggest problems but still this issue remains a bit quiet.

I even added an extension method I slap on the code I refactor to sync, just to be able to find all of them when one day I can delete this function and the compiler will scream everywhere I need to revert back:

    public static class AsyncBug
    {
        // https://github.com/dotnet/SqlClient/issues/593
        public static T TrackAsyncBug<T>(this T input)
            => input;
    }

praying “async all the way down” for 10+ years now - and just letting sit such bugs for 25 months since knowledge - I’m glad npgsql doesn’t have this problem and is twice as fast for both small and large varbinaries

Loading a ~50 MiB blob (including EFcore overhead): npgsql: 500ms (async and non-async) SqlClient non-async: 900 milliseconds SqlClient async: >100 seconds

burned client-cpu (Process.UserProcessorTime): npgsql: <30 milliseconds SqlClient: >100 seconds

Very “solid” basement for a heavily multithreaded app 🤦‍♂️

Yup. The lack of communication from the people setting the goals frustrates me too. I don’t know who they are, how to get their attention or how to make a case that they should pay attention. The source may be open but the project management definitely is not.

Method Mean Error StdDev Gen 0 Gen 1 Gen 2 Allocated
Async 2,181.93 ms 42.299 ms 39.567 ms 2000.0000 1000.0000 1000.0000 30.5 MB
StreamAsync 2,162.94 ms 30.055 ms 28.114 ms 2000.0000 1000.0000 1000.0000 40.52 MB
Sync 18.40 ms 0.359 ms 0.414 ms 406.2500 406.2500 406.2500 20 MB

So no. The only way to get the speed benefit Is to put the reader in sequential mode.

https://github.com/dotnet/SqlClient/blob/f2fd632f522a7c0f96c8b5c6a75ea7f2d80c0dd5/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlDataReader.cs#L1557

There’s also an interesting comment elsewhere in the file that says streaming isn’t supported when using encryption. I think that makes streaming unusable for a generic consumer like EFCore because you’ll have to special case it per driver.

We need to fix the sequential stream issue and then we need to rewrite async reads.

As you suggest the ideal approach would be to use a ReadOnlyMemory then it’d all be safe but that’ll require a runtime api change and for all providers to consume it.

Wouldn’t it be possible for a provider to simply support GetFieldValue<ReadOnlyMemory<byte>>(), without introducing a new API?

I can sort of understand how it might be done but it’s daunting and difficult.

I can certainly understand (and appreciate) that, and I’m definitely not suggesting anyone do anything specific (was just reporting the problem). Along with #547 and possibly other async-related issues, it seems that at least documenting these limitations could help users avoid pitfalls.

BTW do you think there are any workarounds to this, besides switching to sync?

FWIW, I agree with roji, too. I’m onboard with anything that can help the situation. I dislike the original async implementation, particularly with the way it switches in and out of its syncOverAsync mode in various places. I would support anyone who wants to dedicate time to this. It’s just a huge change to core code. It would be important to ensure no regressions, so testing is vital. I feel like we have pretty good test coverage, but there’s always that worry. I also understand trying to address this incrementally.

It is unclear to me whether the owners of the repository would like to prioritize merging the parser logic source code or whether they would accept PRs fixing this issue for the .net core version only,

.NET Framework seems like it’s going to be around for a long time and the changes will benefit both sides. So, I favor merging as much of the code as possible rather than leaving .NET Framework behind and just improving the .NET side. If we did .NET only, I feel like the code would never get merged.

For the record, I am willing to help you merge the required files as fast as you can review/accept/merge them, as you have seen.

Yes, and this has been much appreciated. I’ve been encouraging the team to not let code merge PRs sit too long. For example, I want to get your latest one into a 5th preview early/mid Jan for some exercise before a Feb GA (my estimates). The easier you can make reviewing them, the less likely people will stress out about having to review them and it’ll happen faster. I also appreciate @Wraith2 chiming in on the PRs with his reviews and summaries to help clarify what’s changed.

@darrenhollick If you do go with the sync option, would suggest a comment linking to this issue right next to the call. Avoids the inevitable accidental perf regression when someone sees a sync call hanging out in an async path and changes it back.

I’d say change async calls where you expect large strings to be returned to be sync. For small strings or fixed length types stick with async.

Which libraries? And no, i’m still working on it.

Making the chance netcore only increases the difference between netcore and netfx. Leaving netfx behind isn’t feasible because it will continue to be supported for a long time. The move to a single codebase is more important than fixing this issue because it frees up so much time. If it made sense to fix this in netcore only I could have done it but in doing so made the merge of codebases much harder.

There are many merge PR’s open but they’re waiting on the MS team to have chance to review them.

Hey @Wraith2, we are very appreciative of your help so far. I doubt we’d be able to fix this properly, not with the expertise that you have!

If it’s a matter of prioritisation, please raise this with your management! How could we go about pushing this?

(I work in organisations who pay Microsoft a lot of money for support, asking us to fix a vendor’s bug is pretty rough.)

When I say parsed, I mean there is a loop copying 2 bytes at a time into a 2 byte array,

Well that doesn’t sound right at all. It should be copying the bytes from the packets into the temporary buffer as a block in TryReadByteArray iirc, 2 at a time would be ridiculous as you say. The temporary buffer should also be cached. Unless you’re looking at the netfx build where that optimization hasn’t been integrated yet. As you’ll have seen there are 2 discrete codebases and netcore got some perf work done when it was part of netcore.

My version doesn’t replay the packets when the next packet is already available, instead it continues processing it directly (but it puts the packet in the snapshot to be replayed later if needed). So the O(N^2) complexity gets factored out. It should still be a problem as N gets larger, but I haven’t tested that yet.

Mine is the same but achieved another way. I altered the snapshot logic to speed up the Replay using a doubly linked list and then added the ability to Continue() if the last packet replay ended in a stable state. So this sounds right.

The obvious solution is to not parse the whole string every time, unless you have enough packets available. Moreover, using Encoding.GetString should be much faster than the current algorithm (both for sync and async).

Agreed, and it shouldn’t be doing that. The replay should be repeatedly copying into the temp buffer but not materializing a string unless the overall operation can succeed.

Finally, about continuing after cancellation. You can certainly read a previously read column without a replay, because its value is buffered (in a SqlBuffer instance) until moving to the next row. Unless you are in SequentialMode.

Yup. But what happens if you cancel the read of a string column so that the value isn’t fully read? in that case it wouldn’t be in the SqlBuffer and you can’t re-request the packets. If you’ve still got the snapshot in theory you could continue from where it left off reading more packets but I don’t know if that’s possible, iIhaven’t looked exactly when the snapshot is cleaned up under cancel conditions because cancel wasn’t really on my mind.

I will try to help you with this issue in my free time. It is certainly challenging to maintain this library, let alone to optimize it.

I know. I don’t work here either it’s just a hobby for me too 😁

If your version doesn’t read everything into the snapshot

My version does read every packet into the snapshot, after all. That’s what I’ve been saying.

I had another look at the code and my understanding is that, when trying to read a string column, each time a network packet is available, all previous packets have to be replayed, that is, the whole string is parsed again. When I say parsed, I mean there is a loop copying 2 bytes at a time into a 2 byte array, then converting them to a char and then putting the char in another array. If it fails to parse the whole string (because there is not enough data yet), it simply requests another network packet and repeats the whole process, allocating new buffers.

This explains the O(N^2) time complexity you will notice when running benchmarks, for N = (string size) / (packet size) ratio.

My version doesn’t replay the packets when the next packet is already available, instead it continues processing it directly (but it puts the packet in the snapshot to be replayed later if needed). So the O(N^2) complexity gets factored out. It should still be a problem as N gets larger, but I haven’t tested that yet.

The obvious solution is to not parse the whole string every time, unless you have enough packets available. Moreover, using Encoding.GetString should be much faster than the current algorithm (both for sync and async).

It looks like you don’t need individual packets in the snapshot, but a big continuous buffer that can fit multiple packets, in order to parse strings effectively, both for the sync and especially for the async version.

Finally, about continuing after cancellation. You can certainly read a previously read column without a replay, because its value is buffered (in a SqlBuffer instance) until moving to the next row. Unless you are in SequentialMode. So to support continuing after cancellation, you simply need the data of the current (partially read) column. But to read strings and byte arrays effectively, you should already be keeping all received packets in a big continuous buffer, so you can get two birds with one stone here: both faster strings and and minimal overhead to support continuing after cancellation.

I will try to help you with this issue in my free time. It is certainly challenging to maintain this library, let alone to optimize it.

We recently discovered this issue on net48, the workaround of using SequentialAccess is working great, basically the same performance between sync and async. Thank you @Wraith2

While we use net48 in production, we are able to run our tests on net5.0 and I was able to try this PR nuget.

This is very inexact, but when using Microsoft.Data.SqlClient 4.0.0-preview2 on net5.0, but without specifying SequentialAccess, async access is ~80x slower than sync and ~2x slower than async on net4.8 (System.Data.SqlClient)

When using Microsoft.Data.SqlClient.4.0.0-pull-e809dd9.21252.2 on net5.0, async access is now approximately the same as async on net4.8 (System.Data.SqlClient) but still ~35x slower than sync.

However, in all cases when we specify SequentialAccess, times across all versions, async & sync is all within 10% of each other.

CommandBehavior Version Sync Async
Default net48 71 ms 2433 ms
Default 4.0.0-preview2 62 ms 5329 ms
Default 4.0.0-e809dd9 63 ms 2997 ms
SequentialAccess net48 70 ms 64 ms
SequentialAccess 4.0.0-preview2 64 ms 63 ms

Hey mate, just wanted to shout out our appreciation. We honestly appreciate the effort you’ve put into getting this organised!

It’s a shame Microsoft aren’t allowing you to put more time and effort into this, as it is a fundamental low-level piece for an incredibly important product. Following your posts over the months, it is obvious that this is significantly complex.

Personally I would love to have a play, but I am totally overwhelmed. I also don’t have the test harneses in place to prove whether this is a suitable fix. It was very time consuming simply diagnosing our issue to the point where I was led to this post.

… or use the sync code path on the threadpool, thats how things have been done (and working well) for more than a decade and is perfectly fine for many workloads. As far as I’m concerned we can’t change our DB models just because SqlClient is broken, so we just treat it as not having an async API in the first place.

Note that in general no EF change is needed - once a version of SqlClient is released with this fix, users can simply take a dependency on it, and things should just work. This is because SqlClient typically don’t introduce breaking changes which affect EF.

Wouldn’t it be possible for a provider to simply support GetFieldValue<ReadOnlyMemory<byte>>(), without introducing a new API?

I hadn’t considered that, nice idea.

Currently the only workaround I can think of is to use a stream read overload because I think those drive the parser per-packet directly so you don’t have to replay the packet queue, that’ll need verifying.

@roji Ran your original benchmark with 5.2 preview 4:

Using 5.1.2:

BenchmarkDotNet v0.13.11, Windows 11 (10.0.22631.2715/23H2/2023Update/SunValley3)
11th Gen Intel Core i7-1165G7 2.80GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2

| Method | Mean        | Error      | StdDev     | Median      | Gen0      | Gen1      | Gen2      | Allocated |
|------- |------------:|-----------:|-----------:|------------:|----------:|----------:|----------:|----------:|
| Async  | 3,256.31 ms | 203.078 ms | 579.392 ms | 3,029.00 ms | 2000.0000 | 1000.0000 | 1000.0000 |  30.67 MB |
| Sync   |    46.09 ms |   1.463 ms |   4.314 ms |    45.25 ms |  454.5455 |  454.5455 |  454.5455 |     20 MB |

Using 5.2 preview 4:

BenchmarkDotNet v0.13.11, Windows 11 (10.0.22631.2715/23H2/2023Update/SunValley3)
11th Gen Intel Core i7-1165G7 2.80GHz, 1 CPU, 8 logical and 4 physical cores
.NET SDK 8.0.100
  [Host]     : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2
  DefaultJob : .NET 6.0.25 (6.0.2523.51912), X64 RyuJIT AVX2

| Method | Mean      | Error     | StdDev    | Gen0      | Gen1      | Gen2      | Allocated |
|------- |----------:|----------:|----------:|----------:|----------:|----------:|----------:|
| Async  | 636.92 ms | 12.118 ms | 15.325 ms | 2000.0000 | 1000.0000 | 1000.0000 |  30.68 MB |
| Sync   |  45.00 ms |  1.513 ms |  4.269 ms |  333.3333 |  333.3333 |  333.3333 |     20 MB |

@ErikEJ do you mean this?

size sync async-cm31 async-cm32
1 5.692 10.382 9.228
5 19.78 293.99 96.88
10 39.5 2,297.18 495.26
15 53.92 9,409.01 1,401.61
20 80.46 28,579.30 2,867.15

Yeah async will be much faster but sync is still 20x or 30x faster than async

@Will-Bill here is “Async operation takes 130x longer (for 9MB nvarchar)” https://github.com/dotnet/efcore/issues/22331

sync and async-cm31 would be the same as vanilla for a baseline. all tests done on net6.

@Wraith2 This flag allows you to read / write the http request / response stream with the sync methods. I would not recommend to enable this. This flag matters e.g if you want to pipe the SQL binary stream directly to the http response sqlStream.CopyTo(httpStream). This will not work. And using CopyToAsync will cause the performance issues if the Command is not sequential.

@alser we use sequential access in our software and have no performance issues. But I don’t really know if it is blocking. But I guess it is not.

@kevbry, @darrenhollick or do what I did earlier, make an extension function so you can simply find all references, annotate it with a link to this issue. Look for a suggestion at: https://github.com/dotnet/SqlClient/issues/593#issuecomment-1441660081

It might help to keep the methods async so when this issue finally get’s fixed, it will not cause huge refactorings back to an async style

Something else: We actually migrated to postgresql… 🤔

Note: I took the whole unit test time, including all overhead of DB setup. I am not interested in milliseconds, I just don’t want timeouts 😄

Agreed, if there would be only a workaround for this, or a fork, or whatever. It is ridiculous that this issue can exist for this long. I heard about a replacement client library that would solve all issues, but that was more then a year ago.

We actually switched to use postgresql, it solved a few more issues with hosting for us and the benefit of not having to think about what needs async and what can go without makes development a bit more reliable.

I updated my tracking so the code stays async, but the execution of problematic code will be sync.

    public static class AsyncBlobBug
    {
        public static IAsyncEnumerable<T> AsyncBug<T>(this IEnumerable<T> reader)
        {
            // https://github.com/dotnet/SqlClient/issues/593#issuecomment-1441660081
            var result = reader.ToList();
            return reader.ToAsyncEnumerable();
        }
    }

Maybe at least this will enable you to keep your code clean, and fix it when the new sqlclient will be available.

The issue is basically blocked, because there are two versions of the same files: netcore and netfx. As @Wraith2 explained, until these files have been merged into one version, any changes to them will make the merging significantly harder.

As for a workaround, I was able to configure EF Core so that it uses a custom DbDataReader, where I have overridden the async read methods to use sync under the hood. When the bug is fixed, I can simply remove the custom DbDataReader.

TdsParser and TdsParserStateObject, you know, the really complicated ones. 😢 The reason is that every Read method that can interact with the packet buffer has to be changed from returning bool to returning OperationStatus and that change has to be be made all the way up the call hierarchies. You can see the changes in https://github.com/dotnet/SqlClient/pull/1229/files

I have tried to roll that branch forwards into the current main but I couldn’t find a way manage the bool/operationstatus dichotomy between netfx and netcore so I would have had to reimplement everything in netfx.

Currently, the async code can only request a single packet at a time and then has to re-run. What if there was a method to request multiple packets?

I meant when the replay is about to happen. You could check if there are enough packets in the snapshot to actually try reading the string, otherwise you can short-circuit without copying them at all. The same should apply to varbinary. I will look into this.

it’s just limiting both sync and async

You can verify this in sync mode, if you compare the performance of SqlDataReader.GetString() vs SqlDataReader.GetBytes() + Encoding.ToString (using a pre-allocated byte buffer to store the result of GetBytes). For large strings there is a significant difference in performance.

This particular problem with async strings is waiting on the netcore/netfx codebase merge

This problem has been around forever, so it won’t hurt waiting a little more I suppose. Meanwhile I can do some more research into it.

It does pretty much what you say: it repeatedly copies data (2 bytes at a time though), without materializing a string.

Yup, I see it now. It seems to be doing that remarkably fast or perhaps it’s just limiting both sync and async so I saw async coming into line with sync and was just happy with that and didn’t look further. I’ll add this to my ever growing list of things to look into. We can probably just do a block copy from bytes to chars given the known length and special case a partial trailing char if it happens. It’s worth noting that with my version each one only gets copied once but it’s still a poor way to copy bytes to chars.

Currently, the async code can only request a single packet at a time and then has to re-run. What if there was a method to request multiple packets?

It’s something I’d considered but not got around to looking at yet because we can’t currently benefit from it. The way the protocol works the data is sent to the client eagerly by the server and as you’ve found you will almost always have a packet ready when you ask for one. It would also be difficult to manage efficiently with the input buffer being an array and offsets the way it is now so there’s a work item on my list to look into wrapping the input and output buffers in an abstraction so that the packet management can be moved out of common code paths to tidy everything up.

There’s lots that can be done but only so much I can do at once and the rate is limited by what I can get reviewed and merged. This particular problem with async strings is waiting on the netcore/netfx codebase merge which is underway but currently seems to be paused while the team work on other issues. I didn’t want to try and implement the changes in https://github.com/dotnet/SqlClient/pull/1229 on netcore and netfx and then make sure any changes are kept in sync.

If your version was working without using the snapshots it might be a much easier way to fix the problem. As I said it’s not the way I would have approached it (and didn’t) but that doesn’t make it invalid as long as the effects of using it are understood. The owners will decide how they want to approach it. It’s always good to have another view on things.

There is a problem that your solution doesn’t cover though and it’s likely because you didn’t hit it. It’s possible under some conditions like high speed and with some latency (so I can’t repro on localhost only with a real network sql server) you can end up getting a partial packet read. The network read asks for a chunk of bytes of PacketSize and just assumes that it gets that many so the packet will be complete. Currently with the replay mechanism it’s impossible to hit this issue because the replay takes so long that there’s always a full packet waiting, once we speed it up you will end up with it being possible.

@Wraith2 Can you check https://github.com/panoskj/SqlClient/tree/issue/593 please? I would like some feedback (e.g. is there any bug in the new logic?). It is the smallest possible change with the greatest impact regarding this issue.

I made several attempts to fix it, until I realized ReadAsync completes synchronously (EDIT: the packet is already available and the current thread doesn’t need to block at all) most of the time (that is >90% of the time), so I made it follow the sync path in this case.

So, this version runs in times comparable to the sync version, regardless of the row size. Benchmarks were done for up to 600KB rows, using the default (8KB) packet size and the SqlDataReader.ReadAsync method with default behavior (basically using EF core).

As I said, I have some more improvements I would like to share regarding this issue, but they are more involved and I would like to review them thoroughly first.

Some more information I have gathered so far:

  • SequentialAccess is irrelevant and there was no regression about it. This stackoverflow post (https://stackoverflow.com/a/28619983/2342071) simply does the wrong benchmark. It uses ReadAsync, but then retrieves column values using the index operator, so the operation is sync actually, because with sequential access, only the new row token is read by ReadAsync.
  • Using a Stream is also irrelevant. The internal method for reading string columns is simply inefficient, but it is the same for both sync and async mode. For varchar(max) columns, try calling SqlDataReader.GetBytes + Encoding.GetString (instead of SqlDataReader.GetString directly) and you will see a huge performance improvement (even though it is in sync mode).

EDIT/BONUS: EF expects ReadAsync to read all columns (e.g. default behavior). If you manage to make it run with SequentialAccess, it will actually read the data synchronously (because GetFieldValue is used instead of GetFieldValueAsync - same reason as the aforementioned stackoverflow benchmark).

@Wraith2 sorry missed your email about the work you did (busy at work) But I would love to test this for you as we have a good setup with Automation we can run (as this is how we spotted the slow down – more like catastrophic failures timeouts everywhere)

I know how to compile it all just not sure what the easiest way will be to deploy it - would be good if it was a nuget package i could just remove the one we have and that this one and sent it off to CI and Automation.

Only metrics we gather so far are times. I couldn’t tell you about CPU or RAM pressure … any way our app eco system already hammers all cpu and ram on 3 servers so its quite intense.

This irks me. It’s a managers disapproval expression but does nothing constructive. I’ve investigated and identified the problem but that leads to another problem. There are blocking factors I can’t control or mitigate.

I understand your pain and appreciate the effort taken to try and sort this out. I do not really understand the inner problems with async causing such a serious problem on these varchar(max)… but really the fault is on our side using these columns despite being warned. Unfortunately this was inherited and the only thing I can do now is move forward with this caveat and push to get these blobs out of SQL and use SQL like a normal person 😆

Thanks for fast replies. 🍻

At the moment you’ve got a choice. You can have it fast and crash often or you can have it slow and work most of the time. Current choice is slow but stable.

As for priority. There is only a small team working on it and performance has never been made a priority. I know everyone who works on it would like to make things faster but there isn’t the time to do so. If you want to change that you’ll need to find whoever is in charge of setting goals inside MS and convince them.

Yup, that’ll work but you can only go as far as the max packet size which is just under 16K. The overall factorial behaviour is still there. I’m still working on merging the netfx and netcore codebases so that the complex solution to this problem can be implemented in a single codebase.

One of the engineers on my team tried working around this issue by increasing the packet size in the SQL connection string.

Results:

Method Packet size time
FindAsync 512 (default) 84863ms
FindAsync 32767 (Maximum) 25667ms
Find 512 (default) 310ms
Find 32767 (Maximum) 268ms

I think he was retrieving a ~2MB VARCHAR(MAX) column.