runtime: Possible Critical Bug/Issue with SqlConnection in .NET Core 2.0 when running on Linux or Linux Containers and having multiple concurrent users like in a Load Test

POSSIBLE ISSUE/BUG When load testing with not many concurrent users (just like 20, 50 or more concurrent users) we’ve found a Possible Critical Bug/Issue with SqlConnection in .NET Core 2.0 when running on Linux or Linux Containers. We might be missing something, but I don’t see it. This is happening only when running SqlClient in .NET Core 2.0 on Linux or Linux-containers. When running on Windows or Windows Containers, there are no issues, so, that is a bad symptom…

Error when load testing:
System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred
while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that 
the instance name is correct and that SQL Server is configured to allow remote connections. (provider: 
TCP Provider, error: 40 - Could not open a connection to SQL Server) at 
System.Data.ProviderBase.DbConnectionPool.TryGetConnection(...

REPRO: We have created a very simple project to repro the issue. It is an ASP.NET Core Web API with two methods to run:

  1. Method running a plain native SqlConnection with a query.
  2. Method running a query with Dapper based on a query.
  3. When using EF Core, it should happen the same, but for the repro we perefered to go to the simplest repro which is just to use a SqlConntection.

Basically, the issue is related to SqlConnection, not from higher level frameworks like EF Core or Dapper, as it happens whenever .NET Core 2.0 SqlConnection is used on LINUX.

You can get the repro code from here (Simple ASP.NET Core 2.0 Web API with SqlConnection):

https://github.com/CESARDELATORRE/SQLConnNetCore2IssueRepro

This is the Web API method when using simply a SqlConntection:

[HttpGet("native")]
public async Task<IActionResult> GetNative()
{
    var connectionString = "Server=tcp:eshopsql-e3h5ug5xd3rfs.database.windows.net,1433;Initial Catalog=orderingdb;Persist Security Info=False;User ID=eshop;Password=MY-PASSWORD;MultipleActiveResultSets=False;Encrypt=True;TrustServerCertificate=False;Max Pool Size=400;Connection Timeout=30;";
    using (var connection = new SqlConnection(connectionString))
    {
        try
        {
            connection.Open();
            var command = connection.CreateCommand();
            command.CommandText = @"SELECT o.[Id] as ordernumber,o.[OrderDate] as [date],os.[Name] as [status],SUM(oi.units*oi.unitprice) as total
                FROM [ordering].[Orders] o
                LEFT JOIN[ordering].[orderitems] oi ON  o.Id = oi.orderid 
                LEFT JOIN[ordering].[orderstatus] os on o.OrderStatusId = os.Id                     
                GROUP BY o.[Id], o.[OrderDate], os.[Name] 
                ORDER BY o.[Id]";
            var reader = command.ExecuteReader(System.Data.CommandBehavior.CloseConnection);
            var count = 0;
            while (reader.Read())
            {
                count += 1;
            }
            return Ok(count);
        }
        catch (Exception x)
        {
            throw new InvalidOperationException(x.ToString());
        }
    }
}

The database is actually running on an Azure SQL Database so we test it against a hypothetical “production” database. But this issue happens the same way if the SQL Database is running in a SQL Container.

If I deploy the container/solution from VS into Docker for Windows (running on the Linux VM), the Web API runs OK, as in the following screenshot: NOTE that the URL has to provide “/native” at the end.

The tables in SQL currently have no registries, hence “0” as return value. But the query is being executed:

image

If you want to try with Dapper, try the same URL but ending on “/dapper” which will execute a different code/method.

You can see the Web API container running in the local Docker Linux VM: image

Then, if we do Load Testing and simulate concurrent users, this is when we start getting a lot of errors from SqlConntection, like this:

System.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server) at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection) at System.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection) at System.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource1 retry, DbConnectionOptions userOptions) at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource1 retry, DbConnectionOptions userOptions) at System.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource1 retry) at System.Data.SqlClient.SqlConnection.Open() at `

Contact me at cesardl at microsoft.com for further info about the issue.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 39 (31 by maintainers)

Commits related to this issue

Most upvoted comments

A PR to patch 2.0.x has been merged at https://github.com/dotnet/corefx/pull/26247

I guess my point would be, why not fix it now. All that code in the constructor seems like it could be moved now to a method that is called in the ConnectAsync or ParallelConnectAsync there aren’t any blocking connects in there.

I was telling @saurabh500 offline that this part of the plan sounded very counterintuitive to me:

SqlConnection.OpenAsync() which returns a task should also using the non-async version of the APIs Dns.GetHostAddresses(serverName); and socket.Connect(targetAddr, port); so that it completes the connection establishment on a single thread and the task that SqlConnection.OpenAsync() returns can be awaited upon, instead of awaiting on the underlying networking tasks which need more threads to be spun up.

I.e. shouldn’t we be striving to make SqlConnection.OpenAsync() non-blocking throughout?

But then he explained to me that currently we end up blocking on the result of the async methods we call, in places like https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/SqlClient/SNI/SNITcpHandle.cs#L130 and https://github.com/dotnet/corefx/blob/master/src/System.Data.SqlClient/src/System/Data/SqlClient/SNI/SNITcpHandle.cs#L152

Those lines are part of the SNITCPHandle constructor so it can’t be async unless we move code around to defer the DNS resolution and the socket connection, which would require some additional work.

Without that change, I agree with @saurabh500 it makes sense to switch to the sync versions of the methods in the short term, but I think we should pursue removing blocking calls from the implementation longer term. @saurabh500 @geleems I will create an issue for this.

OK, thanks for helping on this issue and jumping on it so quick! 👍