SqlClient: System.Data.SqlClient 4.7.0 tends to deadlock on async
I am afraid I cannot provide a clean reproduction, but I want to share my experience, so that you could decide if this is an issue or not.
Turns out, that with the bump to 4.7.0 (from 4.6.1) our CI build deadlocks almost every run during unit tests. There is neither a problem on development environments nor on production.
The setup: a xunit test suite starts a .net core 2.1 service host as shared fixture to run some integration tests. part of the application stack is asp.net identity core. The whole environment is being started via docker-compose
so that the dependent SQL Server is also available. A rather complete test database seeding is done during test initialization. Build agents are cheap Azure VMs of B2s size (2vCPUs, 4GB RAM) running Ubuntu 18.04 with docker.
Since I was unable to dump/debug the locked process on the VM inside docker, I spent some time to reproduce the issue locally (i7-8850H, Archlinux, Jetbrains Rider). I finally made it by running stress --cpu 12 --io 10 --vm 5 --hdd 1
, starting the sqlserver
process inside docker and limiting it via taskset -p 0x01 nnnn
to one CPU, and starting the integration test restricted to the same CPU directly via taskset 0x01 dotnet test
.
Attaching a debugger on the locked process reveals the following stack trace (besides various others waiting on reset events, like the xunit runner, test diagnostics and Application insights):
the last frame on the stack resulting from our application code is this line:
await userManager.GetAuthenticationTokenAsync(user, TokenManager.TokenProviderName, "refreshToken");
Note that there is not a single Task.Wait()
or Task.Result
in the whole code base, everything is “async all the way down”. After suspecting the advanced parallelization features of xUnit as part of the problem, I disabled all test parallelization and lifted all restrictions on allowed threads, but without any effect. The SQL Server is only hit by one open connection, state “await command”.
Reverting the solution to System.Data.SqlClient 4.6.1 made all problems go away, there was not a single locking build since then, so I am pretty sure it’s related.
Note: The solution is on a private GitHub Repo, but I’d be able to share an export or more details, when needed.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 42 (29 by maintainers)
Links to this issue
Commits related to this issue
- Revert Async changes to SNIPacket to fix deadlock issues Fixes #262 by reverting dotnet/corefx#34184 until PR #335 is ready with complete fix. — committed to cheenamalhotra/SqlClient by cheenamalhotra 5 years ago
- Revert Async changes to SNIPacket to fix deadlock issues (#349) Fixes #262 by reverting dotnet/corefx#34184 until PR #335 is ready with complete fix. — committed to dotnet/SqlClient by cheenamalhotra 5 years ago
Was experiencing random deadlock of various EF queries on a ASP.NET core 3.1 project when deployed to a docker container with mcr.microsoft.com/dotnet/core/aspnet:3.1 as base. Upgrading to Microsoft.Data.SqlClient 1.1.1 appears to have resolved the issue.
This gave us headaches the whole last night… As soon as our ASP.NET Core 3 project ended up having some load some connections got stuck… Occasionally we got
At some time Kestrel doesn’t even want to handle any connections anymore and the database together with the app is locked until the whole application is restarted.
After disabling MARS everything is working fine again!
EDIT: We also hadn’t any issue when changing the connection string to target a Server 2017 (Developer) instead of Server 2016 (Standard). Maybe that is SQL Server Version related.
I tried to downgrade to 4.6.1 System.Data.SqlClient from Microsoft.Data.SqlClient 1.1.0, seems deadlocks resolved.
I am experiencing this issue. I am using a .Net Standard 2.0 library referencing nuget packages:
Microsoft.EntityFrameworkCore (3.1.24) Microsoft.EntityFrameworkCore.Abstractions (3.1.24) Microsoft.EntityFrameworkCore.Analyzers (3.1.24) Microsoft.EntityFrameworkCore.Design (3.1.24) Microsoft.EntityFrameworkCore.SqlServer (3.1.24) (which has a dependency on Microsoft.Data.SqlClient 1.1.3) Microsoft.EntityFrameworkCore.Tools (3.1.24) NETStandard.Library (2.0.3) Newtonsoft.Json (12.0.3) Serilog (2.10.0) Serilog.Sinks.Console (4.0.1) Serilog.Sinks.File (5.0.0) Serilog.Sinks.MSSqlServer (5.7.1) (which has a dependency on Microsoft.Data.SqlClient 1.1.3) System.Net.Http (4.3.4) System.Xml.XDocument (4.3.0)
Sounds like a fix was applied to SqlClient v2.0.0. cheenamalhotra are you able to confirm? SaveChanges works for me however SaveChangesAsync deadlocks.
Some required functionality resides in a .Net Standard 2.0 library as it is needed by both a .Net Framework (4.7.2) application and a .Net Core 6.0 application. EntityFrameworkCore (3.1.24) was the last compatible version between all Net frameworks.
After using SQL Profiler I can see the insert I am trying to perform leaves an open transaction on the server, inserts the record, but never returns. When I stop the web api application from running the transaction and insert rolls back.
TIA, Andrew
If this is an issue in 4.7.0; presumably going back to 4.6.1 should solve the issue as a workaround?
https://docs.microsoft.com/en-us/dotnet/core/diagnostics/dotnet-dump I think, I’ve not used it and i’m not sure the output would be compatible with windbg which is what i’d use to look at it.
@cheenamalhotra Thank you for your PR. I built new image and gave it some time to run (it’s running more than hour already). No deadlocks so far
@cheenamalhotra I’ve added another project with ef core 2.2.6 and sqlclient 4.7.0. I was able to reproduce the issue almost immediately after the container started. Please pull the changes from my repository.