SqlClient: Execution Timeout Expired Error (258, ReadSniSyncOverAsync)

Describe the bug

When executing SQL such as SELECT FieldA, FieldB FROM A INNER JOIN C ON A.FieldId = C.FieldId UNION SELECT FieldA, FieldD FROM A INNER JOIN D ON A.FieldId = D.FieldId, throw the error like below, not every time, just a little part of queries have this issue.

Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.\n
---> System.ComponentModel.Win32Exception (258): Unknown error 258\n
at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction)\n 
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error)\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync()\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket()\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer()\n 
at Microsoft.Data.SqlClient.TdsParserStateObject.TryReadByteArray(Span1 buff, Int32 len, Int32& totalRead)\n
at Microsoft.Data.SqlClient.TdsParser.TrySkipValue(SqlMetaDataPriv md, Int32 columnOrdinal, TdsParserStateObject stateObj)\n
at Microsoft.Data.SqlClient.TdsParser.TrySkipRow(SqlMetaDataSet columns, Int32 startCol, TdsParserStateObject stateObj)\n
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)\n
at Microsoft.Data.SqlClient.SqlDataReader.TryReadInternal(Boolean setTimeout, Boolean& more)\n
at Microsoft.Data.SqlClient.SqlDataReader.ReadAsync(CancellationToken cancellationToken)\n
--- End of stack trace from previous location where exception was thrown ---\n
at Dapper.SqlMapper.QueryAsync[T](IDbConnection cnn, Type effectiveType, CommandDefinition command) in //Dapper/SqlMapper.Async.cs:line 437\n

To reproduce

Sorry, currently can’t reproduce in the local environment, so can’t provide more detail to reproduce.

Expected behavior

SQL should execute successfully every time.

Further technical details

Microsoft.Data.SqlClient version: 1.1.3 .NET target: Core 3.1 Operating system: Docker container

What I found/tried

https://stackoverflow.com/questions/57270245/sql-server-dbcommand-timeout-with-net-core-container-under-load

https://github.com/StackExchange/Dapper/issues/1435

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 14
  • Comments: 145 (31 by maintainers)

Most upvoted comments

I have found reason for my case. I’m sorry, but that were number of bad performing queries - doing either index or table scan with several dozen millions rows. It was really hard to catch, as those happened 1-2 times per hour and were not affecting anyhow used memory (30% free) or CPU (max 20% usage). Those scans were overloading disk system and harming SQL server caches. So even after query run there were problems in next 5-10 minutes - due to missing data in cache, even for usually good performing queries SQL server had to read too much things from disk and therefore queries were timing-out.

Guys, check your queries. This SQL script gives you those, which had to read lot of data SELECT SUBSTRING(qt.TEXT, (qs.statement_start_offset/2)+1, ((CASE qs.statement_end_offset WHEN -1 THEN DATALENGTH(qt.TEXT) ELSE qs.statement_end_offset END - qs.statement_start_offset)/2)+1) as [Text], qs.execution_count, qs.total_logical_reads, qs.last_logical_reads, qs.total_logical_writes, qs.last_logical_writes, qs.total_worker_time, qs.last_worker_time, qs.total_elapsed_time/1000000 total_elapsed_time_in_S, qs.last_elapsed_time/1000000 last_elapsed_time_in_S, qs.last_execution_time, qp.query_plan FROM sys.dm_exec_query_stats qs CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) qt CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) qp ORDER BY qs.total_logical_reads DESC

In my last post I guessed the 258 error occurs because of full tcp buffers and wrong tcp socket handling on linux (see my previous post above). Further investigation made me more confident that this is the case.

The additional queue I added (last post) helped a bit, but after adding more features (service implementations) to our host process the error came back quite frequently.

Because most 258 errors occured in DbContext.SaveChanges() I synchronized calls in my DbContext base implementation like this:

public class MyBaseDbContext : DbContext
{
    private static readonly object syncObj = new object();

    public override int SaveChanges()
    {
        lock (syncObj)
        {
            return base.SaveChanges();
        }
    }
}

This dirty code slowed down my execution flow and helped reducing the 258 errors. But like others reported here about disabling MARS it did not solve the problem completly.

I examined that the problem occurs often during extensive external web API rest calls. So I synchronized them as well:

static SemaphoreSlim semaphoreSlimGeocodeLock = new SemaphoreSlim(1, 1);

public void MyRestCallMethod() 
{
    try
    {
        await semaphoreSlimGeocodeLock.WaitAsync();

        HttpClient httpClient = new HttpClient();
        string result = await httpClient.GetStringAsync(url);
        ...

    }
    finally
    {
        httpClient.Dispose();
        semaphoreSlimGeocodeLock.Release();
    }
}

No parallel web calls helped a lot and made me sure the error is related to the current tcp buffer usage within the application.

I guess the error originates in some way like this:

  1. SqlClient reads/writes to the sql server tcp socket connection.
  2. Because of the buffer handling bug the read/write loop returns indicating the remotehost has terminated the connection.
  3. SqlClient internal reconnect feature reconnects immediately to the SQL Server.
  4. On the next tcp read/write the same error occures because the application tcp buffer is still full.
  5. This is repeated many times and there might be a threading issue playing a role as well.
  6. The application is freezed and timeout “Unknwon error 258” occurs

Slowing down the execution flow and hoping not reaching tcp buffer limits is not a production ready workaround 😃 I wonder if this is a common problem for all high tcp i/o apps running on Linux and connecting to Sql Server?

This error occured some time after upgrading ef core (not sure which .net core/ef core version). My application runs currently on:

  • Bare metal server: AMD Ryzen 7 2700X
  • Ubuntu 18.04.2 LTS
  • .NET 5.0
  • Microsoft.EntityFrameworkCore.SqlServer 5.0.1

Used SQL Server: Latest SQL Server 2019 Docker container

Side notes:

  • Upgrading from a two years old SQL Server 2017 Container to the latest 2019 version did not change anything.
  • I did try collecting traces with dotnet-trace collect --process-id 27455 --profile database but the results did not include the expected events.

This sounds like a thread starvation issue with the system thread pool. This is known issue when running asynchronously with Microsoft.Data.SqlClient on .NET Core and it’s in our backlog to make it fully asynchronous which will require a big refactor. You might be able to work around this by setting the ThreadPool.SetMinThreads to a high enough value to accommodate, but with many parallel tasks, there’ll be more overhead to allow for continuation tasks inside of SqlClient to run before the timeouts are hit.

We are also experiencing this issue with Linux App Services against Azure SQL. We only began seeing this after migrating our App Services from Windows to Linux. Also on 5.1.1 of SqlClient.

@Malgefor Convert all asynchronous SQL calls into synchronous ones, and everything will be sorted 😃

A few days ago we’ve found the solution for our case. If you are using Azure Load Balancer + SQL Database (or SQL Pool) in a VNET, check if your configuration for outbound connections is set to use the Azure Private Network, otherwise, you could be running into SNAT Port Exhaustion.

To check if that’s the case, go to the Load Balancer resource page, Metrics, and show SNAT Connection Count, filter for failed Statuses.

Another solution is to use more Outbound IPs.

最近观察到业务高峰期会有大量的Unknown error 258,在数据库服务器上观测到cpu 100%,是由 sql server windows nt -64bit 占用 98%+ 没有死锁 使用 select * from master.dbo.sysprocesses 查询到连接数在300左右 当前系统有 170左右 当前系统数据库连接串: Data Source=xxxxxxx,15726;Initial Catalog=xxxxx;Persist Security Info=True;User ID=xxxx;Password=xxxxxx;Integrated Security=false;Connect Timeout=30;MultipleActiveResultSets=false

有两个大并发业务连接到这台数据库 因为一些sql语句 较慢,经过优化后,今天再次使用 select * from master.dbo.sysprocesses 查询当前连接在 60左右,已经没有Unknown error 258 的记录了

同时:数据库的最大连接数设置为0

Is this happening on Azure platform for everyone? Getting same issue with and without MARS, Linux App Service (docker) + Azure SQL.

Related: https://github.com/Azure/aks-engine/issues/4341 ?

This is happening randomly across out systems. You do typically see it on applications that cause higher loads or on SQL Server instances that are under higher load.

There is nothing special in particular about the applications that experience this. It’s happening on both System.Data.SqlClient and Microsoft.Data.SqlClient. AFAICT it is not happening on .NET Framework/Windows (we do not run any .NET Core applications of note on Windows).

I am currently migrating one application that experiences this regularly to .NET 5. Will let you know if that changes anything.

My platform Infrastructure: Docker in Redhat Enterprise Language & Framework: .Net 6 with Entity Framework core Database: SQL Server

After optimizing my API and implementing ‘async’ and ‘await’ with ‘Task’, all the errors have disappeared. Also i set maximum and minimum pooling and connection life cycle

Experienced a flavor of this issue. When inserting singular records in intervals of a few seconds in concurrently running applications. Each application operating on their own instance of DbContext and no concurrent use inside their scope.

Packages Microsoft.Data.SqlClient 5.0.1 Microsoft.EntityFrameworkCore.SqlServer 6.0.10

Exception

An exception occurred in the database while saving changes for context type '"XXX.XXXX.XXXX.XXXContext"'.
Microsoft.EntityFrameworkCore.DbUpdateException: An error occurred while saving the entity changes. See the inner exception for details.
 ---> 
 Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding. 
 ---> System.ComponentModel.Win32Exception (258): Unknown error 258
  at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__207_0(Task`1 result) at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
   at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)---
    End of stack trace from previous location --- at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
    --- End of stack trace from previous location --- at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReaderAsync(RelationalCommandParameterObject parameterObject, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken)ClientConnectionId:554eb334-af3c-400d-92df-1c1c1dd92d2bError Number:-2,State:0,Class:11 --- End of inner exception stack trace --- at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.ExecuteAsync(IRelationalConnection connection, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.ExecuteAsync(IEnumerable`1 commandBatches, IRelationalConnection connection, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(IList`1 entriesToSave, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChangesAsync(StateManager stateManager, Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.<>c__DisplayClass33_0`2.<<ExecuteAsync>b__0>d.MoveNext()--- End of stack trace from previous location --- at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteImplementationAsync[TState,TResult](Func`4 operation, Func`4 verifySucceeded, TState state, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.Storage.ExecutionStrategy.ExecuteAsync[TState,TResult](TState state, Func`4 operation, Func`4 verifySucceeded, CancellationToken cancellationToken) at Microsoft.EntityFrameworkCore.DbContext.SaveChangesAsync(Boolean acceptAllChangesOnSuccess, CancellationToken cancellationToken)

Client .NET 6.0 application implemented as BackgroundService.

Error occured on

 _XContext.X.Add(x);
await _XContext.SaveChangesAsync(cancellationToken);

Last EF generated SQL command before this exception was thrown

SET NOCOUNT ON; 
INSERT INTO [X] ([A], [B], [C], [D], [E], [F], [G]) 
VALUES (@p0, @p1, @p2, @p3, @p4, @p5, @p6); 

SELECT [Z], [K] FROM [X] 
WHERE @@ROWCOUNT = 1 AND [Z] = scope_identity()

Database SQL Server 2016

Resolution for now Optimizing expensive read queries as much as possible that affected the table. After optimizing heavier read queries, I haven’t seen this error.

I had several applications concurrently using the database for one expensive read query each on startup, and then making inserts on same tables (not same records). Each had their own scoped instance of DbContext. The issue seemed to come when inserts were occurring to a table that was under a heavy read query by another process. Small percentage of inserts resulted in this error. Enabling retries didn’t seem to help.

@ErikEJ Upgrading Microsoft.Data.SqlClient to 4.1.0 and Microsoft.Data.SqlClient.SNI to 4.0.0 solved the problem… Thanks. They were buried among a couple other packages that I can’t upgrade, that’s why I didn’t see that they were so outdated…

In my last post I guessed the 258 error occurs because of full tcp buffers and wrong tcp socket handling on linux (see my previous post above). Further investigation made me more confident that this is the case.

The additional queue I added (last post) helped a bit, but after adding more features (service implementations) to our host process the error came back quite frequently.

Because most 258 errors occured in DbContext.SaveChanges() I synchronized calls in my DbContext base implementation like this:

public class MyBaseDbContext : DbContext
{
    private static readonly object syncObj = new object();

    public override int SaveChanges()
    {
        lock (syncObj)
        {
            return base.SaveChanges();
        }
    }
}

This dirty code slowed down my execution flow and helped reducing the 258 errors. But like others reported here about disabling MARS it did not solve the problem completly.

I examined that the problem occurs often during extensive external web API rest calls. So I synchronized them as well:

static SemaphoreSlim semaphoreSlimGeocodeLock = new SemaphoreSlim(1, 1);

public void MyRestCallMethod() 
{
    try
    {
        await semaphoreSlimGeocodeLock.WaitAsync();

        HttpClient httpClient = new HttpClient();
        string result = await httpClient.GetStringAsync(url);
        ...

    }
    finally
    {
        httpClient.Dispose();
        semaphoreSlimGeocodeLock.Release();
    }
}

No parallel web calls helped a lot and made me sure the error is related to the current tcp buffer usage within the application.

I guess the error originates in some way like this:

  1. SqlClient reads/writes to the sql server tcp socket connection.
  2. Because of the buffer handling bug the read/write loop returns indicating the remotehost has terminated the connection.
  3. SqlClient internal reconnect feature reconnects immediately to the SQL Server.
  4. On the next tcp read/write the same error occures because the application tcp buffer is still full.
  5. This is repeated many times and there might be a threading issue playing a role as well.
  6. The application is freezed and timeout “Unknwon error 258” occurs

Slowing down the execution flow and hoping not reaching tcp buffer limits is not a production ready workaround 😃 I wonder if this is a common problem for all high tcp i/o apps running on Linux and connecting to Sql Server?

This error occured some time after upgrading ef core (not sure which .net core/ef core version). My application runs currently on:

  • Bare metal server: AMD Ryzen 7 2700X
  • Ubuntu 18.04.2 LTS
  • .NET 5.0
  • Microsoft.EntityFrameworkCore.SqlServer 5.0.1

Used SQL Server: Latest SQL Server 2019 Docker container

Side notes:

  • Upgrading from a two years old SQL Server 2017 Container to the latest 2019 version did not change anything.
  • I did try collecting traces with dotnet-trace collect --process-id 27455 --profile database but the results did not include the expected events.

My suggestion is create a singleton service serving only one instance of HttpClient.

Without wanting to reveal too much about our infrastructure setup on a public domain, We see this error pretty consistently when load testing a web api, that inserts a large volume of data into an azure sql database.

Details that might hopefully enable a repo:

  1. Web api is running inside linux docker container (asp.net core 5 image) hosted on ubuntu 20.04 vm (azure) with 2 cpus (general purpose)
  2. Azure SQL (serverless, with scale settings: min 0.5 cpu cores, max 2 cpu cores)
  3. The api receives a POST request with bulk data (5000 records) and does an insert using EF Core 5.0.5 sql server package, single call to SaveChanges().
  4. Our load generator, makes 30 concurrent http requests to this api, each request delivering 5000 records for an insert.
  5. We see that roughly 1 or 2 requests consistently fail with an error each time, and this error shows up in the stack trace:

image

At the time of the error, for example, below at 3:18pm, the cpu and io usage of the sql servelerless resource are both between 70-80% and the CPU usage on the node running the docker container is 100%

image

So I am assuming @bennil may be on to something given this is happening in high load, high concurrency scenarios only for us, where CPU usage is pushed to the limit.

@jslaybaugh / @cheenamalhotra: Quick Update:

The issue is environmental.

If you ssh onto the AKS node and have a look at the /var/log/syslog (or syslog.1) you should see “eth0: Lost carrier” messages. It appears to be related to this Azure issue. A fix is being rolled out at the moment.

Hello,

I experience same problems without MARS and with no DbContext pooling. Here is my connection string: Data Source=server_IP;Initial Catalog=my_db;User ID=my_user;Password=my_password;Max Pool Size=200

DbContext: services.AddDbContext<MyDbContext>(o => { o.UseSqlServer(Configuration.GetConnectionString("MyDbContext")); o.EnableSensitiveDataLogging(); }, ServiceLifetime.Transient);

我将db connection 中改了MultipleActiveResultSets=false 后,此问题彻底消失了

@Mewriick , @mcanerim are you using System.Data.SqlClient? The stack trace you have provided is from that library not Microsoft.Data.SqlClient which the original issue was opened for.

So I have migrated one of the applications to .NET 5 and deployed it to our production environment. It is running there only for half a day now but so far the issue is gone. I have even been able to decrease connection timeout from 2 minutes to 5 seconds.

So I would say give .NET 5 a try.

@JRahnama , I found one command that wasn’t wrapped in using block. I will deploy the fix and try to monitor if it helped or not.

Thanks, @ericstj, No, I didn’t try increasing command timeout. Currently, the command timeout is 3s. If a query executes timeout, I got stack like below, seems it is different than the above stack, do you mean the above stack is also a SQL execution timeout? By the way, from New relic, the select query’s duration is 83.4ms, I’m not sure if it is the execution time.

Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding
Microsoft.Data.SqlClient.SqlException (0x80131904): Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.\n
---> System.ComponentModel.Win32Exception (258): Unknown error 258\n
at Microsoft.Data.SqlClient.SqlCommand.<>c.<ExecuteDbDataReaderAsync>b__164_0(Task`1 result)\n
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()\n
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)\n
--- End of stack trace from previous location where exception was thrown ---\n
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)\n
--- End of stack trace from previous location where exception was thrown ---\n
at Dapper.SqlMapper.QueryAsync[T](IDbConnection cnn, Type effectiveType, CommandDefinition command) in /_/Dapper/SqlMapper.Async.cs:line 419\n

In our case the problem has now as mysteriously disappeared as it has mysteriously appeared. The only change we made was migrate to a different virtual machine than the previous SQL Server was running on.

By taking some packet captures by logging in with ssh on the Linux Web App, and typing ip a to get the interface name and then tcpdump -i vnet0g6qkt5hl -w ~/network_traces/0001.pcap we noticed something strange while the problem was still occurring.

A packet containing a SQL query that in the tcpdump was registered with size 2130 bytes and data length 2064 was being “partially acked”. After sending the large packet out, the ACK came back only for the first 1398 bytes, causing a retransmission of the last 666 bytes. However, in response to this retransmission only came another ACK for only the first 1398 bytes, causing another retransmission of the last 666 bytes, and so on and so forth, until eventually the connection timed out. Rather than a threading issue this potentially seems like some kind of network layer problem. However, we can no longer reproduce it, so it’s difficult to investigate more deeply.

As @dazinator already suggested our particular problem was probably something entirely different than what this ticket is about. The 258 timeout error is fairly general - it just means that some kind of wait operation timed out and it can have multiple different underlying causes. Sorry for the noise.

For us, increasing the threadpool threads eliminated this issue. We’re still on linux and are not seeing this (anymore)

When will this major years old linux issue be fixed? Using MS-SQL Docker on Linux fells like MySQL in the nineties.

@lcheunglci Apologies, the error I am seeing is a 258, but it’s not the “ReadSniSyncOverAsync” variety. It’s actually this one here: https://github.com/dotnet/SqlClient/issues/1530 will move my comments there

I am having this issue intermittently with a .net 6.0 app running in containers on Azure Kubernetes Service accessing an on prem SQL Server,

I am using Micrsooft.Data.SqlClient.dll v5.0.0.

The Microsoft.Data.SqlClient.SNI.dll is only in the directory runtimes\win-*\native so I assume it isn’t going to get loaded

I have checked the SNAT Connection Count where the State is Failed and there are none.

We are genuinely considering either changing the hosting off linux hosts or more likely, swapping to a different DB tech as this is such a problem.

@MarvinNorway which Microsoft.Data.SqlClient version - and have you tried 4.1?

[Translated text]

No, it’s because I used update to lock the entire table that I’m 100% of the cpu

Technically it’s the same issue, as the use case we have seen recently also requires a locked table. And yes we have seen this issue also with SELECT and DML statements. When the table is locked and Bulk copy/SELECT is performed, we see that server side memory consumption increases and connections are not released by server. Bulk Copy issue is due to a server side bug and it is being investigated, while SELECT/DML is yet to be confirmed.

As a workaround, you can bump up server’s max worker threads that will allow other pending tasks to occur and close connections that are to be closed. 抱歉,我的猜测错误了,这跟cpu100%没有关系。 我在预生产复现了这个错误。 复现过程分享: 一张表中有70w条记录,每个请求都取更新整个表(其中的一个字段),使用1000个user进行压测,很轻松的复现了这个错误

查看图片:

@niuniu007

[Translated text]

This error was reported in large numbers after I observed that the server cpu was 100% used at a time

We have noticed the server side behavior in case of Bulk Copy, could you please confirm using Bulk Copy operations triggers this?

并不是,是因为我使用update 锁了整个表才导致cpu 100%的

Just want to add that we significantly reduced the chance of running into this problem by not using MARS. Previously we had same issue with or without MARS, but it turned out that we are using KeyVault and the secrets are cached for 24h. Only adding a dummy change to appSettings and saving it would clear the cache.

I’m going to try the last prerelease that includes #920 fix that resolves similar issue #672. I’ll update you on the result.

@cheenamalhotra Will monitor 1.1.4 in prod tomorrow and revert with more detail. Loads are low at the moment, so not experiencing any issues. Yes, I can share a private repo.

We’re having similar issues. It happened out of the blue last week and then again this morning. Suddenly these timeouts start piling up and our DTU’s spike to 100% and stay there.

  • .net core 2.2
  • Linux instances inside AKS
  • Azure SQL
  • Our timeouts are set at 90 seconds
  • We use Dapper.
  • We were on Microsoft.Data.SqlClient 2.1.1 this morning but while it was happening we tried upgrading to 2.1.2, but nothing changed.
  • Last week we tried disabling MARS but that actually broke everything, so we rolled that back immediately

Unfortunately, the only thing that seems to fix this for us is manually scaling up our tier in SQL Azure. We generally run at a P4. So we simply bump it to the next one (P6) and then its fine, even when we immediately bump it back down to the P4. It is like something just gets “stuck” and there is no way to remedy it without scaling the SQL server. Which sucks because who knows when it is going to happen. Here’s a sample stack trace:

Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
at Microsoft.Data.SqlClient.SqlCommand.<>c.b__188_0(Task`1 result)
at System.Threading.Tasks.ContinuationResultTaskFromResultTask`2.InnerInvoke()
at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location where exception was thrown ---
at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot)
--- End of stack trace from previous location where exception was thrown ---
at Dapper.SqlMapper.QueryMultipleAsync(IDbConnection cnn, CommandDefinition command) in /_/Dapper/SqlMapper.Async.cs:line 1013
at Data.BaseDB.QueryMultipleAsync(String sql, Object param, Boolean readReplica, Nullable`1 commandTimeout) in /src/src/Data/BaseDB.cs:line 77
at Data.ReportsDB.PaymentMethodBreakdownAsync(String clientKey, DateTime startDate, DateTime endDate, List`1 events, List`1 paymentMethods) in /src/src/Data/ReportsDB.cs:line 728

Execution Timeout Expired. The timeout period elapsed prior to completion of the operation or the server is not responding.
Unknown error 258

Here’s an example of the DTU’s… everything is trucking along normal and then it starts to go nuts and then around the time in the morning that business starts picking up, we’re at 100% till we finally bounce the tier and then it more or less goes back to normal 😖 Screen Shot 2021-03-16 at 10 55 31 AM

Also opened a ticket with Azure support to see if they can help diagnose from their end.

那么问题应该怎样解决?我的程序是在插入的时候报错。在高负载的情况下 System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. ---> System.ComponentModel.Win32Exception (258): Unknown error 258 at System.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParserStateObject.ReadSniError(TdsParserStateObject stateObj, UInt32 error) at System.Data.SqlClient.TdsParserStateObject.ReadSniSyncOverAsync() at System.Data.SqlClient.TdsParserStateObject.TryReadNetworkPacket() at System.Data.SqlClient.TdsParserStateObject.TryPrepareBuffer() at System.Data.SqlClient.TdsParserStateObject.TryReadByte(Byte& value) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlDataReader.TryConsumeMetaData() at System.Data.SqlClient.SqlDataReader.get_MetaData() at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds) at System.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior) at Dapper.SqlMapper.ExecuteReaderWithFlagsFallback(IDbCommand cmd, Boolean wasClosed, CommandBehavior behavior) in C:\projects\dapper\Dapper\SqlMapper.cs:line 1060 at Dapper.SqlMapper.ExecuteReaderImpl(IDbConnection cnn, CommandDefinition& command, CommandBehavior commandBehavior, IDbCommand& cmd) in C:\projects\dapper\Dapper\SqlMapper.cs:line 2856 `

Any progress?

@JRahnama , If I didn’t miss something, all SqlCommands in my code are wrapped in using block.