SqlClient: SqlException: The PROMOTE TRANSACTION request failed because there is no local transaction active

Under certain circumstances, when a combination of SqlCommands and transactional WCF calls is being run in a massively parallel manner, the current TransactionScope is occasionally “left over”, SqlCommands do not get attached to it and therefore their updates are successfully committed, while the TransactionScope is rolled back. This leads to severe data corruption in a Line-of-Business application of one of our corporate customers.

Stable repro is outlined in this test project: https://github.com/scale-tone/InconsistencyDemo. See more detailed instructions on how to run it there. To reveal the issue more quickly, start and stop LoopHammer console app several times.

The LoopHammer console app emulates a message processing, consisting of the following steps:

  • Creating a TransactionScope.
  • As part of that TransactionScope, executing a couple of SQL statements against a local DB.
  • As part of that TransactionScope, making a transactional WCF call, which fails with an exception. The call forces the transaction to be promoted to a distributed one. The exception causes the entire TransactionScope to be rolled back (including the above SQL statements).

This algorithm is executed multiple times, in parallel (just in the way the real message processing service operates). Note that the repro code only creates one TransactionScope and doesn’t explicitly mix it with SqlTransactions or any other nested transactions.

Expected behavior

The distributed transaction is rolled back entirely, nothing is written to DB.

Actual behavior

After seconds to minutes of normal execution the WCF call fails with the following (not expected!) exception:

System.ServiceModel.ProtocolException: The transaction has aborted. —> System.Transactions.TransactionAbortedException: The transaction has aborted. —> System.Transactions.TransactionPromotionException: Failure while attempting to promote transaction. —> Microsoft.Data.SqlClient.SqlException: The PROMOTE TRANSACTION request failed because there is no local transaction active. at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at Microsoft.Data.SqlClient.TdsParser.Run(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj) at Microsoft.Data.SqlClient.TdsParser.TdsExecuteTransactionManagerRequest(Byte[] buffer, TransactionManagerRequestType request, String transactionName, TransactionManagerIsolationLevel isoLevel, Int32 timeout, SqlInternalTransaction transaction, TdsParserStateObject stateObj, Boolean isDelegateControlRequest) at Microsoft.Data.SqlClient.SqlInternalConnectionTds.ExecuteTransactionYukon(TransactionRequest transactionRequest, String transactionName, IsolationLevel iso, SqlInternalTransaction internalTransaction, Boolean isDelegateControlRequest) at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote() --- End of inner exception stack trace --- at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Promote() at System.Transactions.TransactionStatePSPEOperation.PSPEPromote(InternalTransaction tx) at System.Transactions.TransactionStateDelegatedBase.EnterState(InternalTransaction tx) --- End of inner exception stack trace --- at System.Transactions.TransactionStateAborted.CheckForFinishedTransaction(InternalTransaction tx) at System.Transactions.Transaction.Promote() at System.Transactions.TransactionInterop.ConvertToOletxTransaction(Transaction transaction) at System.Transactions.TransactionInterop.GetTransmitterPropagationToken(Transaction transaction) at System.ServiceModel.Transactions.WsatTransactionFormatter.WriteTransaction(Transaction transaction, Message message) at System.ServiceModel.Channels.TransactionChannel1.WriteTransactionToMessage(Message message, TransactionFlowOption txFlowOption) — End of inner exception stack trace — Server stack trace: at System.ServiceModel.Channels.TransactionChannel1.WriteTransactionToMessage(Message message, TransactionFlowOption txFlowOption) at System.ServiceModel.Channels.TransactionChannel1.WriteTransactionDataToMessage(Message message, MessageDirection direction) at System.ServiceModel.Channels.TransactionRequestChannelGeneric`1.Request(Message message, TimeSpan timeout) at System.ServiceModel.Dispatcher.RequestChannelBinder.Request(Message message, TimeSpan timeout) at System.ServiceModel.Channels.ServiceChannel.Call(String action, Boolean oneway, ProxyOperationRuntime operation, Object[] ins, Object[] outs, TimeSpan timeout) at System.ServiceModel.Channels.ServiceChannelProxy.InvokeService(IMethodCallMessage methodCall, ProxyOperationRuntime operation) at System.ServiceModel.Channels.ServiceChannelProxy.Invoke(IMessage message) Exception rethrown at [0]: at System.Runtime.Remoting.Proxies.RealProxy.HandleReturnMessage(IMessage reqMsg, IMessage retMsg) at System.Runtime.Remoting.Proxies.RealProxy.PrivateInvoke(MessageData& msgData, Int32 type) at DemoService.WcfAgents.IService1.DoWork(String correlationId) at DemoService.WcfAgents.Service1Client.DoWork(String correlationId) at LoopHammer.DostuffHandler.MakeTransactionalWcfCallThatThrows() at LoopHammer.DostuffHandler.Handle()

This happens presumably because at the moment of promotion attempt, the TransactionScope (not itself, but some underlying internal transaction object) appears to be in an aborted state already. Because it appears to be in that state, the above SQL statements do not get attached to it and got committed immediately. So the TransactionPromotionException is just a visible part of the problem, while the actual issue occurs somewhere in between TransactionScope creation and execution of SqlCommands. We’re able to detect that faulty state of TransactionScope beforehand, by executing “select @@trancount” against the DB - in a failure case it appears to be 0 (normally should be 1).

WORKAROUND: if we disable SqlConnection pooling (add “Pooling=false” to connection string), the problem disappears. This makes us thinking, that the bug is somewhere in connection sharing mechanism. Probably, a connection is sometimes not fully detached from one thread, and occasionally allows it to influence a TransactionScope belonging to another thread. If that is the case, then it can not only lead to data corruption but also introduce serious security issues.

The problem is reproducible with the latest Microsoft.Data.SqlClient v1.0.19269.1. Originally it was detected in System.Data.SqlClient coming from .Net Framework - see this variant of the same repro project: https://github.com/samegutt/InconsistencyDemo. .Net Framework version doesn’t matter (tried with all from 4.5.1 to 4.8).

When compiled against sources from this repo, the problem is also reproducible. If we compile and start LoopHammer.exe in Debug mode, we immediately hit this debug assert. If we disable it, the further execution either leads to the same TransactionPromotionException OR to some others:

System.ObjectDisposedException: Cannot access a disposed object. Object name: ‘SqlDelegatedTransaction’. at Microsoft.Data.SqlClient.SqlDelegatedTransaction.GetValidConnection() at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Rollback(SinglePhaseEnlistment enlistment) at System.Transactions.TransactionStateDelegatedAborting.EnterState(InternalTransaction tx) at System.Transactions.Transaction.Rollback() at System.Transactions.TransactionScope.InternalDispose() at System.Transactions.TransactionScope.Dispose() at LoopHammer.DostuffHandler.Handle()

or even to:

Microsoft.Data.SqlClient.SqlException (0x80131904): The Microsoft Distributed Transaction Coordinator (MS DTC) has cancelled the distributed transaction. at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlConnection.cs:line 2103 at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action1 wrapCloseInAction) in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlInternalConnection.cs:line 814 at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\TdsParser.cs:line 1572 at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\TdsParser.cs:line 2792 at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData() in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlDataReader.cs:line 1310 at Microsoft.Data.SqlClient.SqlDataReader.get_MetaData() in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlDataReader.cs:line 275 at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted) in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 5550 at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest) in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 5312 at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry) in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 4907 at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method) in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 4762 at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior, String method) in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 2401 at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader() in C:\projects\CSA\temp\SqlClient\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlCommand.cs:line 2322 at LoopHammer.DostuffHandler.GetTransactionDetails() at LoopHammer.DostuffHandler.Handle() ClientConnectionId:48c113c4-6fcd-4082-85bf-0f39023375bf Error Number:1206,State:137,Class:18

A transactional WCF call is the essential part of the scenario. Surprisingly, forcing a transaction promotion by other means (e.g. by accessing another DB) does not lead to the same problem

For some reason, the MaxDegreeOfParallelism set to 4 maximizes chances for the issue to appear. Experiments with connection pool size didn’t show any differences.

As mentioned above, the issue causes a serious damage to the customer’s large LoB system. Furthermore, concerns are that it might indicate a severe security issue (multiple threads being able to influence each other’s transaction flow).

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 7
  • Comments: 44 (19 by maintainers)

Most upvoted comments

Two months passed. Do we have any progress?

@ramonsmits , I don’t see any data losses happening anymore, yes. But instead I’m occasionally observing this beautiful NRE:

673705f1-2f3b-44af-a9bd-798d70aaed9d System.NullReferenceException: Object reference not set to an instance of an object. at Microsoft.Data.SqlClient.SqlDelegatedTransaction.Rollback(SinglePhaseEnlistment enlistment) at System.Transactions.TransactionStateDelegatedAborting.EnterState(InternalTransaction tx) at System.Transactions.Transaction.Rollback() at System.Transactions.TransactionScope.InternalDispose() at System.Transactions.TransactionScope.Dispose() at LoopHammer.DostuffHandler.Handle()

@cheenamalhotra , as usual, you can easily see this yourself, if you compile the repro against SqlClient 2.0.0-preview3 and run/stop/run it several times.

So the bug isn’t fully fixed still.

Hi @scale-tone @ramonsmits

I have been debugging the issue for a while, and I think I’ve found the error.

It’s just 1 character typo on this line: https://github.com/dotnet/SqlClient/blob/d0672d2c4dfd0f09ba49f09d605b827366c61b1b/src/Microsoft.Data.SqlClient/netfx/src/Microsoft/Data/SqlClient/SqlDelegatedTransaction.cs#L106 The condition should be if (!connection.IsEnlistedInTransaction).

After this change, I don’t get Debug errors anymore in TdsParser.cs neither can I reproduce the issue, but because this issue is reproducible only on busy machines, I’ll request you too to try it out locally in a debug session and let me know if you still can reproduce the problem.

I’ll be amazed if this single character change fixes the problem 🙈 Otherwise I’ll continue investigating further!

@ramonsmits The repro uses package reference 1.0.19269.1, which reproduces issue quickly. Please test with “1.1.0-preview1.19275.1”