SqlClient: TransactionAbortedException when performing queries in parallel inside a transaction scope

Describe the bug

When performing multiple queries in parallel, and each pair of 2 queries are inside a transaction scope, an unexpected transaction error is thrown (see below).

Exception message: System.Transactions.TransactionAbortedException: The transaction has aborted.
Stack trace:
Unhandled exception. System.Transactions.TransactionAbortedException: The transaction has aborted.
 ---> System.Transactions.TransactionPromotionException: Failure while attempting to promote transaction.
 ---> System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote()
   --- End of inner exception stack trace ---
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote()
   at System.Transactions.TransactionStatePSPEOperation.PSPEPromote(InternalTransaction tx)
   at System.Transactions.TransactionStateDelegatedBase.EnterState(InternalTransaction tx)
   --- End of inner exception stack trace ---
   at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx)
   at System.Transactions.EnlistableStates.Promote(InternalTransaction tx)
   at System.Transactions.Transaction.Promote()
   at System.Transactions.TransactionInterop.ConvertToDistributedTransaction(Transaction transaction)
   at System.Transactions.TransactionInterop.GetExportCookie(Transaction transaction, Byte[] whereabouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.GetTransactionCookie(Transaction transaction, Byte[] whereAbouts)
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInterna
l& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync(CancellationToken cancellationToken)
--- End of stack trace from previous location ---

To reproduce

Below code reproduces the error. It runs 100K times 2 queries in parallel where each 2 queries are inside a single transaction scope. Note that after each query the database connection is disposed (and thus closed). This is important to trigger the error. When we change the code, and use 1 connection for both queries and only open 1 connection the error does not occur (see also code comments below).

using System.Transactions;
using Microsoft.Data.SqlClient;

static async Task PerformTransactionWithQuery(int num)
{
    try
    {
        using (new TransactionScope(TransactionScopeAsyncFlowOption.Enabled))
        {
            // Adding "max pool size=1000;" to the connection string seems to trigger the problem less often
            string connStr = @"Server=.\SQLEXPRESS;Database=master;Trusted_Connection=True;Encrypt=False";

            var query = "SELECT COUNT(*) FROM sys.dm_tran_active_transactions";

            await using (var dbConn = new SqlConnection(connStr))
            {
                await dbConn.OpenAsync();

                await using (var command1 = new SqlCommand(query, dbConn))
                {
                    await command1.ExecuteScalarAsync();
                }
            } // Connection is disposed (and thus closed)

            await using (var dbConn = new SqlConnection(connStr)) 
            {
                // Reopening the connection triggers the following error:
                // System.Transactions.TransactionAbortedException: The transaction has aborted.
                //
                // NB. Using a single connection and opening it once does NOT trigger the error
                await dbConn.OpenAsync();
                
                await using (var command2 = new SqlCommand(query, dbConn))
                {
                    await command2.ExecuteScalarAsync();
                }
            } // Connection is disposed (and thus closed)
            
            //Do not complete transaction
        }
    }
    catch (Exception e)
    {
        Console.WriteLine($"Failed {num}");
        throw;
    }
} 

var tasks = Enumerable.Range(0, 100000).ToList().Select(PerformTransactionWithQuery);
await Task.WhenAll(tasks);

Expected behavior

Being able to use multiple connections (with the same connection string) in sequence within the same transaction without errors.

Further technical details

Microsoft.Data.SqlClient version: 4.1.0 .NET target: .NET 6 SQL Server version: SQL Server 2019 Operating system: Windows 11

About this issue

Original URL
State: open
Created 2 years ago
Reactions: 5
Comments: 48 (30 by maintainers)

Most upvoted comments

I’ve spent of time and I think I’ve understood what’s going on.

Consider the following code:

await Task.WhenAll(Enumerable.Range(0, 10).Select(_ => Task.Run(() =>
{
    for (var i = 0; i < 1_000_000; i++)
    {
        using var scope = new TransactionScope(TransactionScopeAsyncFlowOption.Enabled);
        using var dbConn = new SqlConnection(ConnectionString);
        dbConn.Open();

        using var command = new SqlCommand("SELECT COUNT(*) FROM sys.dm_tran_active_transactions", dbConn);
        command.ExecuteScalar();
    }
})));

Some transaction scope is started, a connection is opened and enlisted to it, some operation is performed, and the connection is disposed.
When the scope is disposed, Sys.Tx first calls SqlDelegatedTransaction.Rollback.
Rollback cleans everything up, and then calls connection.CleanupConnectionOnTransactionCompletion(_atomicTransaction). This calls DbConnectionInternal.DetachTransaction, which calls DelegatedTransactionEnded, which puts the connection back into the pool for general usage.
- This makes sense: the transaction has completed, and the connection was closed before the TransactionScope was disposed, so it was in “stasis”, waiting for the transaction to complete before becoming available again.
Since the connection was made available to everyone, some other thread concurrently takes the connection out of the (general) pool, and starts another unrelated transaction scope on it.
Meantime, Sys.Tx isn’t done: after calling Rollback, it also fires the Transaction.TransactionCompleted event, where we’re registered DbConnectionInternal.TransactionCompletedEvent. Note that at this point the connection is in concurrent use by the 2nd user.
The event handler path gets us to TransactionEnded on the new SqlDelegatedTransaction. It sees that connection isn’t null, and then dooms the connection; this code path was meant for when a distributed transaction ends before rollback/commit (see comment on TransactionEnded), but it gets mistakenly invoked here.

As a quick test, I’ve commented out the event handler registration (this line); this means that SqlClient no longer listens to Transaction.TransactionCompleted. Before this change, the repro in this issue consistently failed, whereas after the change it consistently completes successfully. Since the bug apparently isn’t present in the .NET Framework version, I’d look at how it’s working and what changes were made to the .NET Core one which led to this.

I’m not yet quite sure why this bug would lead to promotion to a distributed transaction as we’re seeing above, but I’m guessing that with the dooming of arbitrary connections, the state could easily get screwed up and lead to undefined results.

roji on Nov 27, 2022

@jetizz This occurs when pooling is enabled, and the event source logs show it’s between different threads. So far, a workaround with pooling would be directing connection requests to different connection pools based on the thread in use to avoid conflict between different threads in presence of a transaction. The ApplicationName property in the connection string can be used for this purpose.

Pool creation and assignment

@DavoudEshtehari - Yes that’s what I noted earlier. However what I didn’t notice at the time was that this causes the transaction to be distributed. Running under .NET 6 on linux this will then fail as distributed transactions are not allowed.

ByteMe-web on Jan 31, 2023

Any viable workaround for this issue? We’re getting occasional errors (no exact metrics, about one in 10k connections), and have no idea how to bypass the issue for now. We tried increasing pool size, but it did not help. Is the only way to avoid this - to disable connection pooling? Performance hit will be considerable, and I’d like to avoid it if possible.

Exception we’re getting is the same as in original post. Environment is asp.net, with each request opening up to 10-20 short lived connections.

jetizz on Jan 31, 2023

But I think the more fundamental questions here are whether the connection should be put back into the pool before the TransactionCompleted event is fired (letting other threads start using it);

I can’t think of any good reason for that to happen. It simply doesn’t make sense.

Wraith2 on Dec 15, 2022

@DavoudEshtehari ah OK, I missed that.

In general, it seems that the connection cannot be returned to the (general) pool until the transaction has fully completed; since SqlClient registers for the TransactionCompleted event, that’s when this should happen - not earlier. Alternatively, it’s interesting to ask why SqlClient registers for the event in the first place - the transaction enlistment (SqlDelegatedTransaction) should already receive all the necessary events. I’m guessing there’s a reason why the event registration is done, but removing that would certainly simplify a lot of things (and prevent bugs such as this and #1800).

Anyway, I’d be happy to help with any further thinking on this.

roji on Dec 1, 2022

@roji thank you for the update on this issue. Unfortunately, this bug exists against the .NET Framework as well: https://github.com/dotnet/SqlClient/issues/1675#issuecomment-1289817442.

DavoudEshtehari on Dec 1, 2022

@cheenamalhotra I was also confronted with unexpected escalation to a distributed transaction recently (https://github.com/dotnet/efcore/issues/29054), and was assuming it was related to the .NET 7.0 distributed transaction changes… It’s interesting that this occurs on .NET 6.0.

I’m not sure how this could be happening. I do know that SqlInternalConnection.EnlistNonNull would only call SqlInternalConnection.GetTransactionCookie (which causes the escalation) if the call to Transaction.EnlistPromotableSinglePhase return false (barring bugs in Sys.Tx of course). This would mean that there are somehow two distinct connections at play; some sort of bug which messes up the TransactedConnectionPool could conceivably cause this, where e.g. the second open incorrectly returns a different connection. However, that doesn’t explain why that connection is broken, as indicated by exception…

I suspect the exception we’re seeing is the visible, end result of some earlier problem that we’re not seeing… It’s probably a good idea to concentrate on why a connection was broken in the first place, and ideally trying to step through the SqlClient code to see exactly what’s happening (not trivial as this is non-deterministic). Since this happens in .NET 6.0, I wouldn’t assume a Sys.Tx issue unless there’s a stronger indication.

I unfortunately am too taken up by 7.0 stabilization work at the moment to dive into this myself, but would be happy to follow along and try to help out…

roji on Nov 17, 2022

If you replace await dbConn.OpenAsync(); with dbConn.Open(); (ie. merely switch from async to sync opening of a connection), this code works just fine (ie. even with ts.Complete() commented out).

This is interesting and I’d agree I didn’t look into that to be the possibility, so it certainly points to something missing out with OpenAsync(), which is very strange since both APIs ‘should’ follow the same path. 🤔

There are no distributed transactions

I possibly was imagining .NET 7, but never mind. It’s still a case of delegated transaction as we’re looking at System.Transactions.TransactionPromotionException

Let me dive into it and will get back.

cheenamalhotra on Nov 16, 2022

@cheenamalhotra There are no distributed transactions in my LinqPad example. MSDTC escalation does not happen because only one connection-string is used. If you replace await dbConn.OpenAsync(); with dbConn.Open(); (ie. merely switch from async to sync opening of a connection), this code works just fine (ie. even with ts.Complete() commented out).

When the 1st connection-open request occurs, it should enlist that connection into the Transaction. Then 1st-connection-close occurs (but the inner-connection is not really closed - just returns to the pool). Then the 2nd connection-open request should receive the exact same already-open inner-connection back from the pool (ie. the same one durable resource is enlisted). This is exactly what happens when dbConn.Open() is used. There should be no observable behavior differences between dbConn.Open(); and await dbConn.OpenAsync(); - yet the async version crashes.

sdrapkin on Nov 16, 2022

@joostmeijles @sdrapkin

What you’re trying to do is unsafe and I don’t agree this is bug. Old behavior before v1.1.4 was a bug where accounting was not done for externally aborted transactions. I would request a read on https://github.com/dotnet/SqlClient/issues/237#issuecomment-569856610 and https://github.com/dotnet/SqlClient/issues/237#issuecomment-569965788 to understand the root cause.

How distributed transactions work is that they are decoupled from your application and are running within the MS DTC service. Which means if your app tries to be super-fast and expect MS DTC to keep coordinating with SQL Server for the MISSED commit/rollbacks, it needs time to do so, as it needs to abort the transaction on the server.

From driver end, if you closed a POOLED connection, server-side connection is kept open. So now if you happen to re-open the connection BEFORE MSDTC could terminate your previous transaction, you find that AFTER opening the connection, transaction gets aborted. It should never happen that the same OLD transaction continues to be used, and I hope you’d agree with me on that! So, someone must keep things in order. Either it’s your app, or the driver.

If you want driver to consistently terminate your connection alongwith Distributed transaction, DON’T use connection pool! If you want to use connection pool for performance needs, you MUST manage your transaction lifecycle and cannot leave them to be aborted by server.

Likewise, in the above repro, if you do call ts.Complete() things do just fine!

I would certainly recommend documentation on this somewhere so it’s not a surprise to end users as the default behavior of driver is with pooling enabled, which is clearly not intuitive to users. cc @David-Engel

cheenamalhotra on Nov 16, 2022

@DavoudEshtehari thanks for the update! Sounds as a good change, but unfortunately I am still getting an error, although its a different one:

Unhandled exception. System.InvalidOperationException: The requested operation cannot be completed because the connection has been broken.
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransaction(TransactionRequest transactionRequest, String name, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest)
   at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Initialize()
   at System.Transactions.TransactionStatePSPEOperation.PSPEInitialize(InternalTransaction tx, IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Guid promoterType)
   at System.Transactions.TransactionStateActive.EnlistPromotableSinglePhase(InternalTransaction tx, IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Transaction atomicTransaction, Guid promoterType)
   at System.Transactions.Transaction.EnlistPromotableSinglePhase(IPromotableSinglePhaseNotification promotableSinglePhaseNotification, Guid promoterType)
   at System.Transactions.Transaction.EnlistPromotableSinglePhase(IPromotableSinglePhaseNotification promotableSinglePhaseNotification)
   at Microsoft.Data.SqlClient.SqlInternalConnection.EnlistNonNull(Transaction tx)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.ActivateConnection(Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.PrepareConnection(DbConnection owningObject, DbConnectionInternal obj, Transaction transaction)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.InternalOpenAsync(CancellationToken cancellationToken)

joostmeijles on Oct 25, 2022

@joostmeijles I confirm it’s reproducible with MDS and we’ll return to it after the next release. I’m afraid we can’t provide any estimation for a fix while it’s in the investigation stage. We’ll keep updating here with any findings, and we welcome any contribution as this is an open-source driver.

DavoudEshtehari on Aug 4, 2022