SqlClient: SqlClient. Inappropriate behavior after a commit timeout
I’ve initially created this bug in the CoreFX repository: https://github.com/dotnet/corefx/issues/38900, but it does not look like anyone is looking there. So I’m repeating it here. A small addition: I’ve now tested this using both System.Data.SqlClient (4.6.1) and Microsoft.Data.SqlClient (1.0.19128.1-Preview), compiled for both .NET Core 2.2 & .NET Framework 4.8. In each of these four cases I was able to reproduce this bug.
=============================================================
I have a SQL Azure database and I use System.Data.SqlClient (4.6.1) to talk to it. From time to time (sometimes several times a day, sometimes no incidents for weeks), it would produce a SqlException with Number = -2 (Timeout) during a SqlTransaction.Commit call. I assume that’s happening because SQL Azure databases have mirroring set up and there is some transient network condition happening around Commit() which prevents that from finishing. That’s fine with me - i know those are transient exceptions and would retry them after some interval. The problem is that the connection which caused that exception is returned to the pool. The next time it’s got picked from the pool, it seems to be in some invalid state. Then, when i do “SELECT * FROM MyTable WHERE Id = @Id” on that connection, it would return an empty set, even though that record is 100% there in the table. That is one really disturbing bug.
I’ve tried to build a small reproduction for this issue. The hardest thing was to have SqlTransation.Commit() to throw a timeout exception. I invented a trick to do that (see below, ForceCommitTimeout() function). It’s a bit clumsy one and does not always reproduce. If it does not repoduce in your environment, try increasing RowNumber and running the code several times. From time to time it would produce an exception. Note, that in the connection string I’ve set “Connection Timeout=1”. That timeout value is also used as commit timeout by SqlTransaction. In any case, if someone knows a better way to force a timeout from SqlTransation.Commit() - please share it with me.
The second important point to build a stable reproduction was to set “Max Pool Size=1”. This is to make sure that the same connection is returned from the pool when you open a new connection after a commit timeout has happened.
Before you run the code, you need to create a database named “TestBed” on your DB server and then create a test table in it:
CREATE TABLE TestTable
(
Id INTEGER NOT NULL IDENTITY(1, 1) PRIMARY KEY,
Name NVARCHAR(MAX)
)
Now, when you are all set up, here’s the code to reproduce the problem:
using System;
using System.Data.SqlClient;
using System.Diagnostics;
namespace SqlClientTimeoutIssue
{
internal class Program
{
private const string ConnectionString = "Server=(local); Initial Catalog=TestBed; Integrated Security=True; Connection Timeout=1; Max Pool Size=1";
private const int RowNumber = 10000000;
private static void Main()
{
try
{
SelectOne();
ForceCommitTimeout();
SelectOne();
}
catch (Exception ex)
{
Console.WriteLine("Unexpected exception: {0}", ex);
}
}
private static void SelectOne()
{
using (var conn = new SqlConnection(ConnectionString))
{
conn.Open();
Console.WriteLine("SelectOne. ConnectionId={0}", conn.ClientConnectionId);
var result = 0;
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "SELECT 1";
using (var reader = cmd.ExecuteReader())
{
while (reader.Read())
result = reader.GetInt32(0);
}
}
Console.WriteLine("SelectOne. Result: {0}", result);
}
}
private static void ForceCommitTimeout()
{
using (var conn = new SqlConnection(ConnectionString))
{
conn.Open();
Console.WriteLine("ForceCommitTimeout. ConnectionId={0}", conn.ClientConnectionId);
using (var tran = conn.BeginTransaction())
{
var timer = Stopwatch.StartNew();
for (var i = 0; i < RowNumber; ++i)
{
using (var cmd = conn.CreateCommand())
{
cmd.CommandText = "INSERT TestTable (Name) VALUES(N'Some value')";
cmd.Transaction = tran;
cmd.ExecuteNonQuery();
}
}
Console.WriteLine("ForceCommitTimeout. Insert duration: {0} sec", timer.Elapsed.TotalSeconds);
try
{
timer.Restart();
tran.Commit();
Console.WriteLine("ForceCommitTimeout. Commit duration: {0} sec", timer.Elapsed.TotalSeconds);
}
catch (SqlException ex) when (ex.Number == -2)
{
Console.WriteLine("ForceCommitTimeout. Commit timeout has occurred");
}
}
}
}
}
}
When i run it and i’m lucky to hit a timeout, i get output like this:
SelectOne. ConnectionId=21106a9d-08db-43b5-b2e5-d6f9b751bebb
SelectOne. Result: 1
ForceCommitTimeout. ConnectionId=21106a9d-08db-43b5-b2e5-d6f9b751bebb
ForceCommitTimeout. Insert duration: 1799.231 sec
ForceCommitTimeout. Commit timeout has occurred
SelectOne. ConnectionId=21106a9d-08db-43b5-b2e5-d6f9b751bebb
SelectOne. Result: 0
Notice, that the second call to SelectOne() outputs “0”. Even though it does “SELECT 1”! And that is the essence of this (really weird) bug. I’ve been able to reproduce it on both SQL Azure and my local SQL Server 2017 Dev Edition.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 18 (9 by maintainers)
You might be able to trigger a commit timeout by suspending the
sqlservr.exe
process using Process Explorer. Open 100 transactions concurrently all under write load. Then pause the process. Unpause. You should get lucky on a few of those 100 transactions.You also can set up mirroring or AG and pause the network.
Or, put the transaction log on storage that you can pause. You can pause an SMB file share by interrupting the network.
@cheenamalhotra Thanks a lot! I’ve started using this release in our production environment. If I see the problem happening again, i will open a new issue.
Hi @dmytro-gokun
We added a change to mark active connection as not poolable and doomed when an exception is captured during transaction commit, ensuring corrupted connection is dropped on Dispose.
The fix will be available as part of GA release of 1.0. Keeping the issue open for you to test and confirm in your scenarios.