efcore: Deadlock with SQL Server when inserting a large number of related rows

Description of problem

I’ve created a server-side system that is updated very heavily and runs on SQL Server on Azure. I am seeing deadlocks all over the place when inserting completely unrelated data even in the most simple application. Even when READ_COMMITTED_SNAPSHOT is enabled.

If this cannot be fixed we will have to switch to another persistence framework, which will cost us a lot of time and money and possibly cause our project to overrun (it is imperative this does not happen as there is a hard deadline to enter the market).

I am currently seeing this on locally run Azure functions updating a local SQL Server 2019 developer edition database.

Steps to reproduce

Create the following .NET Core Console app and run it

//Program.cs
	class Program
	{
		const int Tasks = 5;
		static async Task Main(string[] args)
		{
			var trigger = new ManualResetEvent(false);
			var readySignals = new List<ManualResetEvent>();
			var processingTasks = new List<Task>();
			foreach(int index in Enumerable.Range(1, Tasks))
			{
				var readySignal = new ManualResetEvent(false);
				readySignals.Add(readySignal);
				var task = CreateDataAsync(trigger, readySignal);
				processingTasks.Add(task);
				
			}
			WaitHandle.WaitAll(readySignals.ToArray());
			trigger.Set();
			await Task.WhenAll(processingTasks.ToArray());
			Console.WriteLine("Finished");
		}

		private static async Task CreateDataAsync(ManualResetEvent trigger, ManualResetEvent signalReady)
		{
			await Task.Yield();
			using (var context = new AppDbContext())
			{
				var incomingFile = new IncomingFile();
				for(int i = 1; i <= 1000; i++)
				{
					new IncomingFileEvent(incomingFile);
				}
				context.IncomingFile.Add(incomingFile);
				signalReady.Set();
				trigger.WaitOne();
				await context.SaveChangesAsync().ConfigureAwait(false);
			}
		}
	}
	public class AppDbContext : DbContext
	{
		public DbSet<IncomingFile> IncomingFile { get; set; }

		private static readonly DbContextOptions<AppDbContext> Options = CreateOptions();

		public AppDbContext() : base(Options)
		{
		}

		public static DbContextOptions<AppDbContext> CreateOptions()
		{
			var builder = new DbContextOptionsBuilder<AppDbContext>();
			builder.UseSqlServer("Server=DESKTOP-G05BF1U;Database=EFCoreConcurrencyTest;Trusted_Connection=True;");
			return builder.Options;
		}

	}
	public abstract class EntityBase
	{
		[Key]
		[DatabaseGenerated(DatabaseGeneratedOption.None)]
		public Guid Id { get; set; } = Guid.NewGuid();
	}
	public class IncomingFile : EntityBase
	{
		public virtual ICollection<IncomingFileEvent> Events { get; private set; } = new List<IncomingFileEvent>();
	}
	public class IncomingFileEvent : EntityBase
	{
		public Guid IncomingFileId { get; private set; }
		public virtual IncomingFile IncomingFile { get; private set; }

		[Obsolete("Serialization only")]
		public IncomingFileEvent() { }

		public IncomingFileEvent(IncomingFile incomingFile)
		{
			if (incomingFile is null)
				throw new ArgumentNullException(nameof(incomingFile));

			IncomingFile = incomingFile;
			IncomingFileId = incomingFile.Id;
			IncomingFile.Events.Add(this);
		}
	}
CREATE TABLE [dbo].[IncomingFile]
(
	[Id] UNIQUEIDENTIFIER NOT NULL, 
	[ConcurrencyVersion] RowVersion NOT NULL,
	CONSTRAINT [PK_IncomingFile] PRIMARY KEY CLUSTERED([Id])
)
GO

CREATE TABLE [dbo].[IncomingFileEvent]
(
	[Id] UNIQUEIDENTIFIER NOT NULL, 
	[ConcurrencyVersion] RowVersion NOT NULL,
	[IncomingFileId] UNIQUEIDENTIFIER NOT NULL,
	CONSTRAINT [PK_IncomingFileEvent] PRIMARY KEY CLUSTERED([Id]),
	CONSTRAINT [FK_IncomingFileEvent_IncomingFileId]
		FOREIGN KEY ([IncomingFileId])
		REFERENCES [dbo].[IncomingFile] ([Id])
)
GO

Having turned on READ_COMMITTED_SNAPSHOT I also tried every IsolationType available (except Chaos) and none solved the problem.

		public override async Task<int> SaveChangesAsync(bool acceptAllChangesOnSuccess, CancellationToken cancellationToken = default)
		{
			var trans = await Database.BeginTransactionAsync(System.Data.IsolationLevel.Snapshot).ConfigureAwait(false);
			int result = await base.SaveChangesAsync(acceptAllChangesOnSuccess, cancellationToken).ConfigureAwait(false);
			await trans.CommitAsync().ConfigureAwait(false);
			return result;
		}

Exception

System.InvalidOperationException HResult=0x80131509 Message=An exception has been raised that is likely due to a transient failure. Consider enabling transient error resiliency by adding ‘EnableRetryOnFailure()’ to the ‘UseSqlServer’ call. Source=Microsoft.EntityFrameworkCore.SqlServer StackTrace: at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.<ExecuteAsync>d__72.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter1.GetResult() at Microsoft.EntityFrameworkCore.DbContext.<SaveChangesAsync>d__54.MoveNext() at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.ConfiguredTaskAwaitable`1.ConfiguredTaskAwaiter.GetResult() at EFCoreConcurrencyTest.Program.<CreateDataAsync>d__2.MoveNext() in C:\Users\x\source\repos\EFCoreConcurrencyTest\EFCoreConcurrencyTest\Program.cs:line 45 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.GetResult() at EFCoreConcurrencyTest.Program.<Main>d__1.MoveNext() in C:\Users\x\source\repos\EFCoreConcurrencyTest\EFCoreConcurrencyTest\Program.cs:line 28 at System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() at System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(Task task) at System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(Task task) at System.Runtime.CompilerServices.TaskAwaiter.GetResult() at EFCoreConcurrencyTest.Program.<Main>(String[] args)

This exception was originally thrown at this call stack: Microsoft.Data.SqlClient.SqlCommand.ExecuteDbDataReaderAsync.AnonymousMethod__164_0(System.Threading.Tasks.Task<Microsoft.Data.SqlClient.SqlDataReader>) System.Threading.Tasks.ContinuationResultTaskFromResultTask<TAntecedentResult, TResult>.InnerInvoke() System.Threading.Tasks.Task…cctor.AnonymousMethod__274_0(object) System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, object) System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, object) System.Threading.Tasks.Task.ExecuteWithThreadLocal(ref System.Threading.Tasks.Task, System.Threading.Thread) System.Runtime.ExceptionServices.ExceptionDispatchInfo.Throw() System.Runtime.CompilerServices.TaskAwaiter.ThrowForNonSuccess(System.Threading.Tasks.Task) System.Runtime.CompilerServices.TaskAwaiter.HandleNonSuccessAndDebuggerNotification(System.Threading.Tasks.Task) … [Call Stack Truncated]

Inner Exception 1: DbUpdateException: An error occurred while updating the entries. See the inner exception for details.

Inner Exception 2: SqlException: Transaction (Process ID 52) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.

Further technical details

EF Core version: 3.1.6 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET Core 3.1 Operating system: Windows 10 IDE: Visual Studio 2019 16.6.5

DB settings

USE [master]
GO

/****** Object:  Database [EFCoreConcurrencyTest]    Script Date: 02/08/2020 15:44:34 ******/
CREATE DATABASE [EFCoreConcurrencyTest]
 CONTAINMENT = NONE
 ON  PRIMARY 
( NAME = N'EFCoreConcurrencyTest', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\EFCoreConcurrencyTest.mdf' , SIZE = 73728KB , MAXSIZE = UNLIMITED, FILEGROWTH = 65536KB )
 LOG ON 
( NAME = N'EFCoreConcurrencyTest_log', FILENAME = N'C:\Program Files\Microsoft SQL Server\MSSQL15.MSSQLSERVER\MSSQL\DATA\EFCoreConcurrencyTest_log.ldf' , SIZE = 139264KB , MAXSIZE = 2048GB , FILEGROWTH = 65536KB )
 WITH CATALOG_COLLATION = DATABASE_DEFAULT
GO

IF (1 = FULLTEXTSERVICEPROPERTY('IsFullTextInstalled'))
begin
EXEC [EFCoreConcurrencyTest].[dbo].[sp_fulltext_database] @action = 'enable'
end
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ANSI_NULL_DEFAULT OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ANSI_NULLS OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ANSI_PADDING OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ANSI_WARNINGS OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ARITHABORT OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET AUTO_CLOSE OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET AUTO_SHRINK OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET AUTO_UPDATE_STATISTICS ON 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET CURSOR_CLOSE_ON_COMMIT OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET CURSOR_DEFAULT  GLOBAL 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET CONCAT_NULL_YIELDS_NULL OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET NUMERIC_ROUNDABORT OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET QUOTED_IDENTIFIER OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET RECURSIVE_TRIGGERS OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET  DISABLE_BROKER 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET AUTO_UPDATE_STATISTICS_ASYNC OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET DATE_CORRELATION_OPTIMIZATION OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET TRUSTWORTHY OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET ALLOW_SNAPSHOT_ISOLATION ON 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET PARAMETERIZATION SIMPLE 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET READ_COMMITTED_SNAPSHOT ON 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET HONOR_BROKER_PRIORITY OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET RECOVERY FULL 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET  MULTI_USER 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET PAGE_VERIFY CHECKSUM  
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET DB_CHAINING OFF 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET FILESTREAM( NON_TRANSACTED_ACCESS = OFF ) 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET TARGET_RECOVERY_TIME = 60 SECONDS 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET DELAYED_DURABILITY = DISABLED 
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET QUERY_STORE = OFF
GO

ALTER DATABASE [EFCoreConcurrencyTest] SET  READ_WRITE 
GO

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 2
  • Comments: 63 (39 by maintainers)

Most upvoted comments

Reply from product group was the following

The deadlocks thrown by the code are due to locks being acquired during PK lookup operation to ensure referential integrity. As the code inserts thousands of records in a single batch instead of using Nested Loop join + Seek operator the optimizer decides to use Merge Join and Index Scan. The merge join plan is much cheaper from the cost perspective.

The solutions the customer may consider are:

  1. Eliminating the PK-FK relationship and ensure the integrity is maintained at the application level. Entire call is wrapped into a transaction and the GUIDs (PK for “IncomingFile” row as well as for “IncomingFileEvent”) are generated at the app level so it should do the trick.
  2. Keep the transaction as short as possible. It means they can consider splitting current transaction into 2-transactions (i.e., one inserting the data into “IncomingFile” and the next one inserting into “IncomingFileEvent”). This will ensure the locks on the “IncomingFile” are quickly released
  3. As mentioned in the thread the query hint OPTION (LOOP JOIN) can be used alternatively to enforce singleton lookups.
  4. In-Memory tables (SchemaAndData) as lock and latch free structures are excellent candidates for high transactional processing. With In-Memory tables they can continue using PK-FK relationship, no locks will be acquired but the table schema must be changed to eliminate RowVersion column.

My thoughts

  1. If we remove the R from RDBMS then we may as well go with MongoDB which is proving to be at least 4 times faster than SQL Server at inserting data directly (although no EFCore support yet).
  2. We cannot remove the A from ACID.
  3. This is what we are currently doing. Despite seeing warnings that it is slower I cannot actually see any degradation in performance. Maybe it is only slower when there is lots of data? I don’t know.
  4. We cannot abandon the RowVersion column. It’s an important part of our solution to ensure users do not overwrite each other’s changes.

Thank you for being so involved in this, it’s rare (and very valuable) to get this kind of very in-depth analysis!

@ErikEJ I added an index to IncomingFileEvent.IncomingFileId and it still happened.

I also made the PKs non clustered, no change.

I also have the child table a composite PK and no change.

I’m in touch with MS support at the moment. I’ll provide an update once we know what is happening.

@mrpmorris you can inherit from DbCommandInterceptor instead of implementing IDbCommandInterceptor to avoid needing to provide all the empty method implementations.

Reopening to consider this for what EF Core does when updating (in the backlog).

I definitely think there’s a point here to be explored/understood, but we can also postpone the investigation and come back to it once the new batching API is actually implemented… Of course, if you have the spare time and are interested 😃

I am interested, and will make time 😃

PS: You have all been extremely helpful. Thank you very much!

@roji You are correct, but I can’t help but be frightened 😃

I really hope it works, good luck!

If I am reading this correctly, it should be in the November release. Is that correct?

Thanks for posting these details… We’ll take another look at this, especially once the better batching API is implemented in SqlClient.

@mrpmorris thanks for this additional info, it’s definitely interesting!

Yes, EF Core does allow you to set up interceptors, through which you can do arbitrary manipulations of SQL. Unfortunately this feature isn’t documented yet, but there are various resources around the Internet: see this one which should probably help. The main tricky part here is to identify which command should have its text altered - for queries you can add tags, but for updates there’s no similar thing.

A more clean/complete solution would be for you to replace the EF Core service responsible for doing updates in SQL Server. This is certainly not as simple as a simple interceptor, but may not be as complex as it seems. A good starting point should be SqlServerUpdateSqlGenerator - if you need guidance on how to replace this, I can help.

(but if you’re looking for a quick/easy way to tack on the LOOP JOIN option, interception is probably the way to go).

If you add the following then there is no conflict, and it inserts 64 tasks * 1000 records each in a total of 4.8 seconds.

I don’t know what the implications of adding this are. Is there a way for me to intercept the EFCore insert statements and stick this at the end of each one?

commandTextBulder.AppendLine("OPTION (LOOP JOIN)");
fileEventCommand.CommandText = commandTextBulder.ToString();

@mrpmorris I’ll take a look in the next few days.

@AndriySvyryd How is that tuning done? Simply changing it and running lots of scenarios?