efcore: EF8 - Regression on Contains query when column collation is not default DB collation

There seems to be a regression bug related to how EF8 now translates Contains with OPENJSON instead of using IN in previous versions. This applies to SQL server and SQL database and it happens when the string column uses a collation that is not the default DB collation.

I could reproduce this issue on Azure SQL Database.

I could also reproduce on a local SQL server instance, but only if the DB containment type is set to Partial.

image

If I set the containment type to ‘None’, then the query runs fine.

using EFRepro;
using Microsoft.EntityFrameworkCore;

var ids = new[] { "a", "b", "c" };

try
{
    using (var db = new BloggingContext())
    {
        db.Blogs.Add(new Blog { Id = Guid.NewGuid().ToString() });
        db.SaveChanges();
        var query = db.Blogs.Where(b => ids.Contains(b.Id));
        string sql = query.ToQueryString();
        var blogs = query.ToList();
    }
}
catch (Exception ex)
{
    Console.WriteLine(ex.ToString());
}


namespace EFRepro
{
    public class BloggingContext : DbContext
    {
        public DbSet<Blog> Blogs { get; set; }

        protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        {
            optionsBuilder.UseSqlServer(@"Server=(local)\sql2022;Database=Blogs;Trusted_Connection=True;TrustServerCertificate=true");
        }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            modelBuilder.Entity<Blog>()
                .Property(i => i.Id)
                .UseCollation("Latin1_General_100_BIN2_UTF8");
        }
    }

    public class Blog
    {
        public string Id { get; set; }
    }
}

Microsoft.Data.SqlClient.SqlException (0x80131904): Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_100_BIN2_UTF8" in the equal to operation.
   at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at Microsoft.Data.SqlClient.SqlDataReader.TryConsumeMetaData()
   at Microsoft.Data.SqlClient.SqlDataReader.get_MetaData()
   at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean isAsync, Int32 timeout, Task& task, Boolean asyncWrite, Boolean inRetry, SqlDataReader ds, Boolean describeParameterEncryptionRequest)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String method)
   at Microsoft.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteReader(CommandBehavior behavior)
   at Microsoft.Data.SqlClient.SqlCommand.ExecuteDbDataReader(CommandBehavior behavior)
   at Microsoft.EntityFrameworkCore.Storage.RelationalCommand.ExecuteReader(RelationalCommandParameterObject parameterObject)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.InitializeReader(Enumerator enumerator)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.<>c.<MoveNext>b__21_0(DbContext _, Enumerator enumerator)
   at Microsoft.EntityFrameworkCore.SqlServer.Storage.Internal.SqlServerExecutionStrategy.Execute[TState,TResult](TState state, Func`3 operation, Func`3 verifySucceeded)
   at Microsoft.EntityFrameworkCore.Query.Internal.SingleQueryingEnumerable`1.Enumerator.MoveNext()
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at System.Linq.Enumerable.ToList[TSource](IEnumerable`1 source)
   at Program.<Main>$(String[] args) in C:\Dev\EFRepro\EFRepro\Program.cs:line 14
ClientConnectionId:5b0ee62c-b3a5-4aa9-b1fd-3f3a904e5689
Error Number:468,State:9,Class:16

Generated sql

DECLARE @__ids_0 nvarchar(4000) = N'["a","b","c"]';

SELECT [b].[Id]
FROM [Blogs] AS [b]
WHERE [b].[Id] IN (
    SELECT [i].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] nvarchar(450) '$') AS [i]
)
Msg 468, Level 16, State 9, Line 3
Cannot resolve the collation conflict between "SQL_Latin1_General_CP1_CI_AS" and "Latin1_General_100_BIN2_UTF8" in the equal to operation.

In order to fix it, EF would have add a COLLATE clause with the name of the collation that is configured in the model. Either on the Id or Value column.

SELECT [b].[Id]
FROM [Blogs] AS [b]
WHERE [b].[Id] COLLATE Latin1_General_100_BIN2_UTF8 IN (
    SELECT [i].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] nvarchar(450) '$') AS [i]
)


SELECT [b].[Id]
FROM [Blogs] AS [b]
WHERE [b].[Id] IN (
    SELECT [i].[value] COLLATE Latin1_General_100_BIN2_UTF8
    FROM OPENJSON(@__ids_0) WITH ([value] nvarchar(450) '$') AS [i]
)

EF Core version: 8.0.0-rc.2.23480.1 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 8.0 Operating system: Windows 11 IDE: VS 2022 17.8 Preview 4

About this issue

  • Original URL
  • State: open
  • Created 8 months ago
  • Comments: 32 (19 by maintainers)

Most upvoted comments

Here’s a summary of the situation here:

As a workaround, #32434 has been approved for 8.0.2; this means you’ll be able to write the following:

 var query = db.Blogs
    .Where(b => EF.Constant(ids).Contains(b.Id))
    .ToList();

This will make EF produce the old SQL, where the elements of ids are included as constants in the SQL, and the collation problem does not occur. This also does not have the index performance problems that the current workaround has (with EF.Functions.Collate, see above).

Going forward, we’re going to work on the following to fix this properly:

  1. Make collation part of the type mapping (#29620).
  2. Improve our type mapping inference to properly bubble up collations, along with other type mapping facets (#32333). This would allow the collation to bubble up the expression tree, like type mappings already do. For example, for b.SomeColumnWithACollation + 'Some Constant', the type mapping of the resulting concatenation node would be the column’s collation (which bubbled up).
  3. When applying the inferred type mapping on the OPENJSON expression node (code), if the inferred type mapping has a non-null collation, and the JSON argument to OPENJSON does not (e.g. it’s a parameter), then wrap the parameter in a COLLATE node to apply the inferred collation.

A targeted workaround here would be to tell EF to add a COLLATE clause to the column passed to IN, using EF.Functions.Collate():

 var query = db.Blogs
    .Where(b => ids.Contains(EF.Functions.Collate(b.Id, "Latin1_General_100_BIN2_UTF8")))
    .ToList();

This results in the following SQL, which disambiguates the collation:

SELECT [b].[Id]
FROM [Blogs] AS [b]
WHERE [b].[Id] COLLATE Latin1_General_100_BIN2_UTF8 IN (
    SELECT [i].[value]
    FROM OPENJSON(@__ids_0) WITH ([value] nvarchar(450) '$') AS [i]
)

Applying the EF.Functions.Collate function will result in the query not being able to leverage any index on the column.

It definitely makes sense for this to happen. If this seriously affects query performance, it can indeed make sense to reduce the SQL Server compatibility level to opt out of the OPENJSON translation entirely, and makes EF revert to the previous, pre-8.0 behavior.

Another possible per-query workaround here is to force the array to be treated as a constant rather than a parameter, therefore causing its elements to appear as constants in the SQL (exactly as before). This currently involves using the Expression builder API; it’s certainly not pretty, but it’s a possible workaround in case you need it; see https://github.com/dotnet/efcore/issues/32394#issuecomment-1825513906 for an example.

Regardless, this is something I plan to investigate ASAP (next week), and we’ll indeed update the release notes as necessary.

Great, that’s good to hear. I’ll work on properly fixing this for 9.0, and if we see lots of users run into it and the fix is low-risk, we can consider patching 8.0 at that point.

Thanks for the workaround with EF.Functions.Collate @roji This definitely makes this bug less severe for us.

I’m not sure why it happens only on contained databases. My hunch is that it has something to do with how the collation is inherited from the server level collation, but I’m not really clear on that.

I found that there is a big problem with the work around suggested by @roji . Applying the EF.Functions.Collate function will result in the query not being able to leverage any index on the column. That is, it will prevent SARGability and result in an index scan instead of an index seek. So there is a massive performance downside.

For that reason, maybe using UseCompatibilityLevel is more appropriate. However, is there any documentation explaining exactly what the impact of using it is?

@masterworgen Then you must use:

options.UseSqlServer(configuration.GetConnectionString("MyDBConnect"),
                sqlServerOptionsBuilder => sqlServerOptionsBuilder.UseCompatibilityLevel(120)));

Removing servicing-consider as we won’t be patching a fix for now - a fix is likely to be large/risky, and there’s a pretty good workaround with EF.Functions.Collate. We can revisit this if more user feedback is provided that this is a problem.

Would a workaround here be to set the sql compatibility level lower so that it reverts to the old Contains translation? https://learn.microsoft.com/en-us/ef/core/what-is-new/ef-core-8.0/whatsnew#queries-with-primitive-collections

@clement911 Thanks for the tip. We are always adding to our tests, but the scope is large. We do run over 120,000 tests, the majority of which are functional tests that use the database, on every build.

@clement911 The team has no control over this. We can only release when management says we can.

@ErikEJ I’m not sure whether the 8.0.1 will come in Feb or later at this point, there may be an earlier release.