SqlClient: Execution Timeout Expired Error (258, ReadSniSyncOverAsync)

Describe the bug

When executing SQL such as SELECT FieldA, FieldB FROM A INNER JOIN C ON A.FieldId = C.FieldId UNION SELECT FieldA, FieldD FROM A INNER JOIN D ON A.FieldId = D.FieldId, throw the error like below, not every time, just a little part of queries have this issue.

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.\n
---> System.ComponentModel.Win32Exception (258): Unknown error 258\n
at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction)\n 
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadByteArray(Span1 buff, Int32 len, Int32& totalRead)\n
at Microsoft.Data.SqlClient.TdsParser.TrySkipValue(SqlMetaDataPriv md, Int32 columnOrdinal, TdsParserStateObject stateObj)\n
at Microsoft.Data.SqlClient.TdsParser.TrySkipRow(SqlMetaDataSet columns, Int32 startCol, TdsParserStateObject stateObj)\n
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)\n
at Microsoft.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)\n
at Microsoft.Data.SqlClient.SqlDataReader.ReadAsync(CancellationToken cancellationToken)\n
--- End of stack trace from previous location where exception was thrown ---\n
at Dapper.SqlMapper.QueryAsync[T](IDbConnection cnn, Type effectiveType, CommandDefinition command) in //Dapper/SqlMapper.Async.cs:line 437\n

To reproduce

Sorry, currently can’t reproduce in the local environment, so can’t provide more detail to reproduce.

Expected behavior

SQL should execute successfully every time.

Further technical details

Microsoft.Data.SqlClient version: 1.1.3 .NET target: Core 3.1 Operating system: Docker container

What I found/tried

https://stackoverflow.com/questions/57270245/sql-server-dbcommand-timeout-with-net-core-container-under-load

https://github.com/StackExchange/Dapper/issues/1435

About this issue

Original URL
State: open
Created 4 years ago
Reactions: 14
Comments: 145 (31 by maintainers)

Most upvoted comments

I have found reason for my case. I’m sorry, but that were number of bad performing queries - doing either index or table scan with several dozen millions rows. It was really hard to catch, as those happened 1-2 times per hour and were not affecting anyhow used memory (30% free) or CPU (max 20% usage). Those scans were overloading disk system and harming SQL server caches. So even after query run there were problems in next 5-10 minutes - due to missing data in cache, even for usually good performing queries SQL server had to read too much things from disk and therefore queries were timing-out.

Guys, check your queries. This SQL script gives you those, which had to read lot of data SELECT SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1, ((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) as [Text], qs.execution_count, qs.total_logical_reads, qs.last_logical_reads, qs.total_logical_writes, qs.last_logical_writes, qs.total_worker_time, qs.last_worker_time, qs.total_elapsed_time/1000000 total_elapsed_time_in_S, qs.last_elapsed_time/1000000 last_elapsed_time_in_S, qs.last_execution_time, qp.query_plan FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp ORDER BY qs.total_logical_reads DESC

brjasha on Mar 17, 2021

In my last post I guessed the 258 error occurs because of full tcp buffers and wrong tcp socket handling on linux (see my previous post above). Further investigation made me more confident that this is the case.

The additional queue I added (last post) helped a bit, but after adding more features (service implementations) to our host process the error came back quite frequently.

Because most 258 errors occured in DbContext.SaveChanges() I synchronized calls in my DbContext base implementation like this:

public class MyBaseDbContext : DbContext
{
    private static readonly object syncObj = new object();

    public override int SaveChanges()
    {
        lock (syncObj)
        {
            return base.SaveChanges();
        }
    }
}

This dirty code slowed down my execution flow and helped reducing the 258 errors. But like others reported here about disabling MARS it did not solve the problem completly.

I examined that the problem occurs often during extensive external web API rest calls. So I synchronized them as well:

static SemaphoreSlim semaphoreSlimGeocodeLock = new SemaphoreSlim(1, 1);

public void MyRestCallMethod() 
{
    try
    {
        await semaphoreSlimGeocodeLock.WaitAsync();

        HttpClient httpClient = new HttpClient();
        string result = await httpClient.GetStringAsync(url);
        ...

    }
    finally
    {
        httpClient.Dispose();
        semaphoreSlimGeocodeLock.Release();
    }
}

No parallel web calls helped a lot and made me sure the error is related to the current tcp buffer usage within the application.

I guess the error originates in some way like this:

SqlClient reads/writes to the sql server tcp socket connection.
Because of the buffer handling bug the read/write loop returns indicating the remotehost has terminated the connection.
SqlClient internal reconnect feature reconnects immediately to the SQL Server.
On the next tcp read/write the same error occures because the application tcp buffer is still full.
This is repeated many times and there might be a threading issue playing a role as well.
The application is freezed and timeout “Unknwon error 258” occurs

Slowing down the execution flow and hoping not reaching tcp buffer limits is not a production ready workaround 😃 I wonder if this is a common problem for all high tcp i/o apps running on Linux and connecting to Sql Server?

This error occured some time after upgrading ef core (not sure which .net core/ef core version). My application runs currently on:

Bare metal server: AMD Ryzen 7 2700X
Ubuntu 18.04.2 LTS
.NET 5.0
Microsoft.EntityFrameworkCore.SqlServer 5.0.1

Used SQL Server: Latest SQL Server 2019 Docker container

Side notes:

Upgrading from a two years old SQL Server 2017 Container to the latest 2019 version did not change anything.
I did try collecting traces with dotnet-trace collect --process-id 27455 --profile database but the results did not include the expected events.

bennil on Dec 6, 2021

This sounds like a thread starvation issue with the system thread pool. This is known issue when running asynchronously with Microsoft.Data.SqlClient on .NET Core and it’s in our backlog to make it fully asynchronous which will require a big refactor. You might be able to work around this by setting the ThreadPool.SetMinThreads to a high enough value to accommodate, but with many parallel tasks, there’ll be more overhead to allow for continuation tasks inside of SqlClient to run before the timeouts are hit.

lcheunglci on Oct 25, 2022

We are also experiencing this issue with Linux App Services against Azure SQL. We only began seeing this after migrating our App Services from Windows to Linux. Also on 5.1.1 of SqlClient.

jbogard on Aug 16, 2023

@Malgefor Convert all asynchronous SQL calls into synchronous ones, and everything will be sorted 😃

svap-roshan on Aug 10, 2023

A few days ago we’ve found the solution for our case. If you are using Azure Load Balancer + SQL Database (or SQL Pool) in a VNET, check if your configuration for outbound connections is set to use the Azure Private Network, otherwise, you could be running into SNAT Port Exhaustion.

To check if that’s the case, go to the Load Balancer resource page, Metrics, and show SNAT Connection Count, filter for failed Statuses.

Another solution is to use more Outbound IPs.

arielmoraes on Feb 1, 2022

最近观察到业务高峰期会有大量的Unknown error 258，在数据库服务器上观测到cpu 100%，是由 sql server windows nt -64bit 占用 98%+ 没有死锁使用 select * from master.dbo.sysprocesses 查询到连接数在300左右当前系统有 170左右当前系统数据库连接串： Data Source=xxxxxxx,15726;Initial Catalog=xxxxx;Persist Security Info=True;User ID=xxxx;Password=xxxxxx;Integrated Security=false;Connect Timeout=30;MultipleActiveResultSets=false

有两个大并发业务连接到这台数据库因为一些sql语句较慢，经过优化后，今天再次使用 select * from master.dbo.sysprocesses 查询当前连接在 60左右，已经没有Unknown error 258 的记录了

同时：数据库的最大连接数设置为0

niuniu007 on Dec 24, 2021

Is this happening on Azure platform for everyone? Getting same issue with and without MARS, Linux App Service (docker) + Azure SQL.

Will-Bill on Nov 1, 2021

This is happening randomly across out systems. You do typically see it on applications that cause higher loads or on SQL Server instances that are under higher load.

There is nothing special in particular about the applications that experience this. It’s happening on both System.Data.SqlClient and Microsoft.Data.SqlClient. AFAICT it is not happening on .NET Framework/Windows (we do not run any .NET Core applications of note on Windows).

I am currently migrating one application that experiences this regularly to .NET 5. Will let you know if that changes anything.

mrmartan on Dec 2, 2020

My platform Infrastructure: Docker in Redhat Enterprise Language & Framework: .Net 6 with Entity Framework core Database: SQL Server

After optimizing my API and implementing ‘async’ and ‘await’ with ‘Task’, all the errors have disappeared. Also i set maximum and minimum pooling and connection life cycle

sajidur on May 29, 2023

Experienced a flavor of this issue. When inserting singular records in intervals of a few seconds in concurrently running applications. Each application operating on their own instance of DbContext and no concurrent use inside their scope.

Packages Microsoft.Data.SqlClient 5.0.1 Microsoft.EntityFrameworkCore.SqlServer 6.0.10

Exception

An exception occurred in the database while saving changes for context type '"XXX.XXXX.XXXX.XXXContext"'.
Microsoft.EntityFrameworkCore.DbUpdateException: An error occurred while saving the entity changes. See the inner exception for details.
 ---> 
 Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. 
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
  at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__207_0(Task`1 result) at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)---
    End of stack trace from previous location --- at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
    --- End of stack trace from previous location --- at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken)ClientConnectionId:554eb334-af3c-400d-92df-1c1c1dd92d2bError Number:-2,State:0,Class:11 --- End of inner exception stack trace --- at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IList`1 entriesToSave, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(StateManager stateManager, Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass33_0`2.<<ExecuteAsync>b__0>d.MoveNext()--- End of stack trace from previous location --- at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)

Client .NET 6.0 application implemented as BackgroundService.

Error occured on

 _XContext.X.Add(x);
await _XContext.SaveChangesAsync(cancellationToken);

Last EF generated SQL command before this exception was thrown

SET NOCOUNT ON; 
INSERT INTO [X] ([A], [B], [C], [D], [E], [F], [G]) 
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6); 

SELECT [Z], [K] FROM [X] 
WHERE @@ROWCOUNT = 1 AND [Z] = scope_identity()

Database SQL Server 2016

Resolution for now Optimizing expensive read queries as much as possible that affected the table. After optimizing heavier read queries, I haven’t seen this error.

I had several applications concurrently using the database for one expensive read query each on startup, and then making inserts on same tables (not same records). Each had their own scoped instance of DbContext. The issue seemed to come when inserts were occurring to a table that was under a heavy read query by another process. Small percentage of inserts resulted in this error. Enabling retries didn’t seem to help.

jkauppinen on Oct 21, 2022

@ErikEJ Upgrading Microsoft.Data.SqlClient to 4.1.0 and Microsoft.Data.SqlClient.SNI to 4.0.0 solved the problem… Thanks. They were buried among a couple other packages that I can’t upgrade, that’s why I didn’t see that they were so outdated…

MarvinNorway on Jul 8, 2022

In my last post I guessed the 258 error occurs because of full tcp buffers and wrong tcp socket handling on linux (see my previous post above). Further investigation made me more confident that this is the case.

The additional queue I added (last post) helped a bit, but after adding more features (service implementations) to our host process the error came back quite frequently.

Because most 258 errors occured in DbContext.SaveChanges() I synchronized calls in my DbContext base implementation like this:
public class MyBaseDbContext : DbContext
{
    private static readonly object syncObj = new object();

    public override int SaveChanges()
    {
        lock (syncObj)
        {
            return base.SaveChanges();
        }
    }
}
This dirty code slowed down my execution flow and helped reducing the 258 errors. But like others reported here about disabling MARS it did not solve the problem completly.

I examined that the problem occurs often during extensive external web API rest calls. So I synchronized them as well:
static SemaphoreSlim semaphoreSlimGeocodeLock = new SemaphoreSlim(1, 1);

public void MyRestCallMethod() 
{
    try
    {
        await semaphoreSlimGeocodeLock.WaitAsync();

        HttpClient httpClient = new HttpClient();
        string result = await httpClient.GetStringAsync(url);
        ...

    }
    finally
    {
        httpClient.Dispose();
        semaphoreSlimGeocodeLock.Release();
    }
}
No parallel web calls helped a lot and made me sure the error is related to the current tcp buffer usage within the application.

I guess the error originates in some way like this:

SqlClient reads/writes to the sql server tcp socket connection.

Because of the buffer handling bug the read/write loop returns indicating the remotehost has terminated the connection.

SqlClient internal reconnect feature reconnects immediately to the SQL Server.

On the next tcp read/write the same error occures because the application tcp buffer is still full.

This is repeated many times and there might be a threading issue playing a role as well.

The application is freezed and timeout “Unknwon error 258” occurs

Slowing down the execution flow and hoping not reaching tcp buffer limits is not a production ready workaround 😃 I wonder if this is a common problem for all high tcp i/o apps running on Linux and connecting to Sql Server?

This error occured some time after upgrading ef core (not sure which .net core/ef core version). My application runs currently on:

Bare metal server: AMD Ryzen 7 2700X

Ubuntu 18.04.2 LTS

.NET 5.0

Microsoft.EntityFrameworkCore.SqlServer 5.0.1

Used SQL Server: Latest SQL Server 2019 Docker container

Side notes:

Upgrading from a two years old SQL Server 2017 Container to the latest 2019 version did not change anything.

I did try collecting traces with dotnet-trace collect --process-id 27455 --profile database but the results did not include the expected events.

My suggestion is create a singleton service serving only one instance of HttpClient.

geomorillo on Feb 15, 2022

Without wanting to reveal too much about our infrastructure setup on a public domain, We see this error pretty consistently when load testing a web api, that inserts a large volume of data into an azure sql database.

Details that might hopefully enable a repo:

Web api is running inside linux docker container (asp.net core 5 image) hosted on ubuntu 20.04 vm (azure) with 2 cpus (general purpose)
Azure SQL (serverless, with scale settings: min 0.5 cpu cores, max 2 cpu cores)
The api receives a POST request with bulk data (5000 records) and does an insert using EF Core 5.0.5 sql server package, single call to SaveChanges().
Our load generator, makes 30 concurrent http requests to this api, each request delivering 5000 records for an insert.
We see that roughly 1 or 2 requests consistently fail with an error each time, and this error shows up in the stack trace:

At the time of the error, for example, below at 3:18pm, the cpu and io usage of the sql servelerless resource are both between 70-80% and the CPU usage on the node running the docker container is 100%

So I am assuming @bennil may be on to something given this is happening in high load, high concurrency scenarios only for us, where CPU usage is pushed to the limit.

dazinator on Feb 1, 2022

@jslaybaugh / @cheenamalhotra: Quick Update:

The issue is environmental.

If you ssh onto the AKS node and have a look at the /var/log/syslog (or syslog.1) you should see “eth0: Lost carrier” messages. It appears to be related to this Azure issue. A fix is being rolled out at the moment.

andrehaupt on Mar 25, 2021

Hello,

I experience same problems without MARS and with no DbContext pooling. Here is my connection string: Data Source=server_IP;Initial Catalog=my_db;User ID=my_user;Password=my_password;Max Pool Size=200

DbContext: services.AddDbContext<MyDbContext>(o => { o.UseSqlServer(Configuration.GetConnectionString("MyDbContext")); o.EnableSensitiveDataLogging(); }, ServiceLifetime.Transient);

brjasha on Feb 11, 2021

我将db connection 中改了MultipleActiveResultSets=false 后，此问题彻底消失了

niuniu007 on Feb 6, 2021

@Mewriick , @mcanerim are you using System.Data.SqlClient? The stack trace you have provided is from that library not Microsoft.Data.SqlClient which the original issue was opened for.

JRahnama on Jan 19, 2021

So I have migrated one of the applications to .NET 5 and deployed it to our production environment. It is running there only for half a day now but so far the issue is gone. I have even been able to decrease connection timeout from 2 minutes to 5 seconds.

So I would say give .NET 5 a try.

mrmartan on Dec 8, 2020

@JRahnama , I found one command that wasn’t wrapped in using block. I will deploy the fix and try to monitor if it helped or not.

lukas-navratil on Oct 21, 2020

Thanks, @ericstj, No, I didn’t try increasing command timeout. Currently, the command timeout is 3s. If a query executes timeout, I got stack like below, seems it is different than the above stack, do you mean the above stack is also a SQL execution timeout? By the way, from New relic, the select query’s duration is 83.4ms, I’m not sure if it is the execution time.

Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.\n
---> System.ComponentModel.Win32Exception (258): Unknown error 258\n
at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__164_0(Task`1 result)\n
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()\n
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\n
--- End of stack trace from previous location where exception was thrown ---\n
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\n
--- End of stack trace from previous location where exception was thrown ---\n
at Dapper.SqlMapper.QueryAsync[T](IDbConnection cnn, Type effectiveType, CommandDefinition command) in /_/Dapper/SqlMapper.Async.cs:line 419\n

frankyuan on Jul 13, 2020

In our case the problem has now as mysteriously disappeared as it has mysteriously appeared. The only change we made was migrate to a different virtual machine than the previous SQL Server was running on.

By taking some packet captures by logging in with ssh on the Linux Web App, and typing ip a to get the interface name and then tcpdump -i vnet0g6qkt5hl -w ~/network_traces/0001.pcap we noticed something strange while the problem was still occurring.

A packet containing a SQL query that in the tcpdump was registered with size 2130 bytes and data length 2064 was being “partially acked”. After sending the large packet out, the ACK came back only for the first 1398 bytes, causing a retransmission of the last 666 bytes. However, in response to this retransmission only came another ACK for only the first 1398 bytes, causing another retransmission of the last 666 bytes, and so on and so forth, until eventually the connection timed out. Rather than a threading issue this potentially seems like some kind of network layer problem. However, we can no longer reproduce it, so it’s difficult to investigate more deeply.

As @dazinator already suggested our particular problem was probably something entirely different than what this ticket is about. The 258 timeout error is fairly general - it just means that some kind of wait operation timed out and it can have multiple different underlying causes. Sorry for the noise.

bailsman on Dec 11, 2023

For us, increasing the threadpool threads eliminated this issue. We’re still on linux and are not seeing this (anymore)

MichelZ on Nov 17, 2023

When will this major years old linux issue be fixed? Using MS-SQL Docker on Linux fells like MySQL in the nineties.

bennil on May 8, 2023

@lcheunglci Apologies, the error I am seeing is a 258, but it’s not the “ReadSniSyncOverAsync” variety. It’s actually this one here: https://github.com/dotnet/SqlClient/issues/1530 will move my comments there

dazinator on Dec 13, 2022

I am having this issue intermittently with a .net 6.0 app running in containers on Azure Kubernetes Service accessing an on prem SQL Server,

I am using Micrsooft.Data.SqlClient.dll v5.0.0.

The Microsoft.Data.SqlClient.SNI.dll is only in the directory runtimes\win-*\native so I assume it isn’t going to get loaded

I have checked the SNAT Connection Count where the State is Failed and there are none.

We are genuinely considering either changing the hosting off linux hosts or more likely, swapping to a different DB tech as this is such a problem.

gamblen on Oct 20, 2022

@MarvinNorway which Microsoft.Data.SqlClient version - and have you tried 4.1?

ErikEJ on Jul 8, 2022

[Translated text]

No, it’s because I used update to lock the entire table that I’m 100% of the cpu

Technically it’s the same issue, as the use case we have seen recently also requires a locked table. And yes we have seen this issue also with SELECT and DML statements. When the table is locked and Bulk copy/SELECT is performed, we see that server side memory consumption increases and connections are not released by server. Bulk Copy issue is due to a server side bug and it is being investigated, while SELECT/DML is yet to be confirmed.

As a workaround, you can bump up server’s max worker threads that will allow other pending tasks to occur and close connections that are to be closed. 抱歉，我的猜测错误了，这跟cpu100%没有关系。我在预生产复现了这个错误。复现过程分享：一张表中有70w条记录，每个请求都取更新整个表（其中的一个字段），使用1000个user进行压测，很轻松的复现了这个错误

查看图片：

niuniu007 on Nov 10, 2021

@niuniu007

[Translated text]

This error was reported in large numbers after I observed that the server cpu was 100% used at a time

We have noticed the server side behavior in case of Bulk Copy, could you please confirm using Bulk Copy operations triggers this?

并不是，是因为我使用update 锁了整个表才导致cpu 100%的

niuniu007 on Nov 10, 2021

Just want to add that we significantly reduced the chance of running into this problem by not using MARS. Previously we had same issue with or without MARS, but it turned out that we are using KeyVault and the secrets are cached for 24h. Only adding a dummy change to appSettings and saving it would clear the cache.

quanmaiepi on Nov 4, 2021

I’m going to try the last prerelease that includes #920 fix that resolves similar issue #672. I’ll update you on the result.

luarvic on Apr 8, 2021

@cheenamalhotra Will monitor 1.1.4 in prod tomorrow and revert with more detail. Loads are low at the moment, so not experiencing any issues. Yes, I can share a private repo.

andrehaupt on Mar 22, 2021

We’re having similar issues. It happened out of the blue last week and then again this morning. Suddenly these timeouts start piling up and our DTU’s spike to 100% and stay there.

.net core 2.2
Linux instances inside AKS
Azure SQL
Our timeouts are set at 90 seconds
We use Dapper.
We were on Microsoft.Data.SqlClient 2.1.1 this morning but while it was happening we tried upgrading to 2.1.2, but nothing changed.
Last week we tried disabling MARS but that actually broke everything, so we rolled that back immediately

Unfortunately, the only thing that seems to fix this for us is manually scaling up our tier in SQL Azure. We generally run at a P4. So we simply bump it to the next one (P6) and then its fine, even when we immediately bump it back down to the P4. It is like something just gets “stuck” and there is no way to remedy it without scaling the SQL server. Which sucks because who knows when it is going to happen. Here’s a sample stack trace:

Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at Microsoft.Data.SqlClient.SqlCommand.<>c.b__188_0(Task`1 result)
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
--- End of stack trace from previous location where exception was thrown ---
at Dapper.SqlMapper.QueryMultipleAsync(IDbConnection cnn, CommandDefinition command) in /_/Dapper/SqlMapper.Async.cs:line 1013
at Data.BaseDB.QueryMultipleAsync(String sql, Object param, Boolean readReplica, Nullable`1 commandTimeout) in /src/src/Data/BaseDB.cs:line 77
at Data.ReportsDB.PaymentMethodBreakdownAsync(String clientKey, DateTime startDate, DateTime endDate, List`1 events, List`1 paymentMethods) in /src/src/Data/ReportsDB.cs:line 728

Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Unknown error 258

Here’s an example of the DTU’s… everything is trucking along normal and then it starts to go nuts and then around the time in the morning that business starts picking up, we’re at 100% till we finally bounce the tier and then it more or less goes back to normal 😖 Screen Shot 2021-03-16 at 10 55 31 AM

Also opened a ticket with Azure support to see if they can help diagnose from their end.

jslaybaugh on Mar 16, 2021

那么问题应该怎样解决？我的程序是在插入的时候报错。在高负载的情况下 System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (258): Unknown error 258 at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData() at System.Data.SqlClient.SqlDataReader.get_MetaData() at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds) at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior) at Dapper.SqlMapper.ExecuteReaderWithFlagsFallback(IDbCommand cmd, Boolean wasClosed, CommandBehavior behavior) in C:\projects\dapper\Dapper\SqlMapper.cs:line 1060 at Dapper.SqlMapper.ExecuteReaderImpl(IDbConnection cnn, CommandDefinition& command, CommandBehavior commandBehavior, IDbCommand& cmd) in C:\projects\dapper\Dapper\SqlMapper.cs:line 2856 `

niuniu007 on Feb 1, 2021

Any progress?

WeihanLi on Dec 16, 2020

@JRahnama , If I didn’t miss something, all SqlCommands in my code are wrapped in using block.

frankyuan on Oct 19, 2020