SqlClient: Hangs in ExecuteReaderAsync

Describe the bug

Every once in a while, our app (Azure Function v3 runs on a dedicated plan, Windows host, .Net Core 3.1, Azure SQL, Dapper extensions) starts to throw many InvalidOperationException with the message:

Timeout expired. The timeout period elapsed prior to obtaining a connection from the pool. This may have occurred because all pooled connections were in use and max pool size was reached.

This seems to happen sporadically (could be a few times per day), without any clear “trigger” (the function is isn’t serving any unusual amounts of requests. Actually, it’s usually underutilized). Also, the SQL server isn’t busy at all at this time (<5% utilization, without any long running queries).

We’ve enabled SqlClient’s tracing functionality and found that when this happens, all of a sudden connections completely stop to return to the pool and after some time (sometimes 30 seconds, sometimes 60 seconds), they all return to the pool together. Also, the same function host connects to two separate SQL servers (so, we have two separate connection pools). Despite these are separate servers that are used to serve different traffic (and execute different stored procedures), the exact same behavior is observed in both connection pools at the same time.

According to the logs, the “hang” isn’t related to disposing/closing the connections. Instead, it seems that the delay happens between the call to ExecuteReaderAsync and EndExecuteReaderAsync. We’ve also checked SQL Server’s audit log for the same time frame and we can see that the stored procedure completed immediately, so there were no long running queries at that time.

Below is a sample log of such a scenario. Note the jump in time between 4:54:44.757 and 4:55:09.787:

timestamp [Local Time]	message
2/15/2021, 4:55:09.788 PM	ScopeLeave - 67659
2/15/2021, 4:55:09.788 PM	PoolerTrace - <prov.DbConnectionPool.PutNewObject|RES|CPOOL> 1, Connection 163, Pushing to general pool.
2/15/2021, 4:55:09.788 PM	PoolerTrace - <prov.DbConnectionInternal.DeactivateConnection|RES|INFO|CPOOL> 163, Deactivating
2/15/2021, 4:55:09.788 PM	PoolerTrace - <prov.DbConnectionPool.DeactivateObject|RES|CPOOL> 1, Connection 163, Deactivating.
2/15/2021, 4:55:09.788 PM	PoolerTrace - <prov.DbConnectionInternal.PrePush|RES|CPOOL> 163, Preparing to push into pool, owning connection 0, pooledCount=0
2/15/2021, 4:55:09.788 PM	PoolerTrace - <prov.DbConnectionInternal.CloseConnection|RES|CPOOL> 163 Closing.
2/15/2021, 4:55:09.788 PM	CorrelationTrace - <sc.SqlConnection.Close|API|Correlation> ObjectID 3286, ActivityID 8eb731f2-4ffb-47f2-b5c4-d9b5bb310cdb:1
2/15/2021, 4:55:09.788 PM	ScopeEnter - <sc.SqlConnection.Close|API> 3286
2/15/2021, 4:55:09.788 PM	ScopeLeave - 67658
2/15/2021, 4:55:09.788 PM	ScopeEnter - <sc.SqlDataReader.Close|API> 1
2/15/2021, 4:55:09.788 PM	ScopeLeave - 67657
2/15/2021, 4:55:09.788 PM	ScopeEnter - <sc.SqlDataReader.Close|API> 1
2/15/2021, 4:55:09.788 PM	ScopeLeave - 67654
2/15/2021, 4:55:09.787 PM	ScopeLeave - 67655
2/15/2021, 4:55:09.787 PM	ScopeLeave - 67656
2/15/2021, 4:55:09.787 PM	ScopeEnter - <sc.SqlDataReader.Read|API> 1
2/15/2021, 4:55:09.787 PM	ScopeEnter - <sc.SqlDataReader.NextResult|API> 1
2/15/2021, 4:55:09.787 PM	ScopeEnter - <sc.SqlDataReader.NextResultAsync|API> 1
2/15/2021, 4:55:09.787 PM	ScopeLeave - 67652
2/15/2021, 4:55:09.787 PM	ScopeLeave - 67653
2/15/2021, 4:55:09.787 PM	ScopeEnter - <sc.SqlDataReader.Read|API> 1
2/15/2021, 4:55:09.787 PM	ScopeEnter - <sc.SqlDataReader.ReadAsync|API> 1
2/15/2021, 4:55:09.787 PM	CorrelationTrace - <sc.SqlCommand.EndExecuteReaderAsync|Info|Correlation> ObjectID3260, ActivityID 8eb731f2-4ffb-47f2-b5c4-d9b5bb310cdb:1
2/15/2021, 4:54:44.757 PM	Trace - <sc.SqlCommand.ExecuteReader|INFO> 3260, Command executed as RPC.
2/15/2021, 4:54:44.757 PM	CorrelationTrace - <sc.SqlCommand.ExecuteReaderAsync|API|Correlation> ObjectID 3260, behavior=16, ActivityID 38ccce37-36c4-40fe-a99e-d47e8fb99097:1
2/15/2021, 4:54:44.757 PM	Trace - <sc.SqlCommand.set_CommandType|API> 3260, 40
2/15/2021, 4:54:44.757 PM	Trace - <sc.SqlCommand.set_CommandTimeout|API> 3260, 120
2/15/2021, 4:54:44.757 PM	Trace - <sc.SqlCommand.set_CommandText|API> 3260, String Value = 'sp_XXXXXXXXXXXX'
2/15/2021, 4:54:44.757 PM	ScopeLeave - 67149
2/15/2021, 4:54:44.757 PM	Trace - <sc.SqlCommand.set_Connection|API> 3260, 3286
2/15/2021, 4:54:44.757 PM	ScopeEnter - <prov.DbConnectionHelper.CreateDbCommand|API> 3286
2/15/2021, 4:54:44.757 PM	Trace - <prov.DbConnectionHelper.ConnectionString_Get|API> 3286
2/15/2021, 4:54:44.757 PM	ScopeLeave - 67148
2/15/2021, 4:54:44.756 PM	PoolerTrace - <prov.DbConnectionInternal.ActivateConnection|RES|INFO|CPOOL> 163, Activating
2/15/2021, 4:54:44.756 PM	PoolerTrace - <prov.DbConnectionInternal.PostPop|RES|CPOOL> 163, Preparing to pop from pool,  owning connection 0, pooledCount=0
2/15/2021, 4:54:44.756 PM	PoolerTrace - <prov.DbConnectionPool.GetFromGeneralPool|RES|CPOOL> 1, Connection 163, Popped from general pool.
2/15/2021, 4:54:44.756 PM	PoolerTrace - <prov.DbConnectionPool.GetConnection|RES|CPOOL> 1, Getting connection.
2/15/2021, 4:54:44.756 PM	CorrelationTrace - <sc.SqlConnection.Open|API|Correlation> ObjectID 3286, ActivityID 38ccce37-36c4-40fe-a99e-d47e8fb99097:1
2/15/2021, 4:54:44.756 PM	ScopeEnter - <sc.SqlConnection.Open|API|Correlation> ObjectID 3286, ActivityID 38ccce37-36c4-40fe-a99e-d47e8fb99097:1
2/15/2021, 4:54:44.756 PM	Trace - <prov.DbConnectionHelper.ConnectionString_Set|API> 3286, 'Data Source=XXXXXXXXXX;Initial Catalog=XXXXXXXXXX;Persist Security Info=True;Max Pool Size=200;Multiple Active Result Sets=False;Connect Timeout=30;Encrypt=True;Trust Server Certificate=False'
2/15/2021, 4:54:44.756 PM	Trace - <prov.DbConnectionHelper.ConnectionString_Set|API> 3286, 'Data Source=XXXXXXXXXX;Initial Catalog=XXXXXXXXXX;Persist Security Info=True;Max Pool Size=200;Multiple Active Result Sets=False;Connect Timeout=30;Encrypt=True;Trust Server Certificate=False'

Any idea on what could cause this behavior?

To reproduce

Seems to happen sporadically without a clear “trigger” (still, could happen a few times per day). We’re unable to reproduce this issue reliably. Even when we try to reproduce it in a test environment, with 10x the traffic, there’s no guarantee it will reproduce.

Expected behavior

The function shouldn’t hang and the connection should return to the pool.

Further technical details

Microsoft.Data.SqlClient version: 2.0.1 .NET target: Core 3.1 SQL Server version: Azure SQL Operating system: Azure Functions v3, dedicated plan (Windows host) Using Dapper extensions

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

@msallin yes, both 2.1 and 3.0 work with EF Core

I don’t have a repro. We aren’t able to reproduce it locally. What’s interesting however is, that we have some slow queries on SQL server (means they compete with 700ms instead of 70ms). There is very much noise all around, but we think we see in such a situation we more queries which timeout (time out is about 10s).

We can tell that it isn’t related to thread pool starvation due to the counters we use to monitor, due to the missing Kestrel warnings and mainly due to the traces about the thread pool we captured from the runtime.

Also interesting: It happens in bursts. There is not one SQL command which times out but multiple in a row.

EDIT: We use EF Core and the async methods. Now we rewrote an application to use sync methods. I will let you know if this helps with the timeouts. If it’s really a problem with EF Core / SQL-Client and async the result will be that we see some queries which take longer but no timeouts anymore.