efcore: System.InvalidOperationException: Unable to translate a collection subquery in a projection... using Union or Concat

Hello team

I have this piece of code that is working fine on EF Core 5, and previously on v3:

This is a property defined on the DbContext:

public IQueryable<Candidate> MyCandidates
{
    get
    {
        if (this.usersService.UserHasPermission("all:candidates"))
        {
            return this.Candidates;
        }

        var userId = this.usersService.GetUserId();

        var candidates = this.Candidates
            .Where(x =>
                x.CreatedByUserId == userId || x.OwnedByUserId == userId);

        var asignedCandidates =
            from cs in this.Candidates
            from an in cs.Assignations
            where an.AssignedUserId == userId &&
            (an.Status == AssignationStatus.Approved ||
            an.Status == AssignationStatus.Direct)
            select cs;

        return asignedCandidates
            .Union(candidates);
    }
}

Basically, if the current user has the “all:candidates” permission, he can query using the Candidates DbSet, if not, he can query the union of candidates created by him, plus the candidates assigned to him.

Then when I try to run the following query, if the user doesn’t have the “all:candidates” permission, and uses the code with the Union() I have the following exception:

var userId = this.usersService.GetUserId();

var query = dbContext.MyCandidates
    .AsSplitQuery()
    .Include(x => x.Pins
        .Where(x => x.UserId == userId))
    .Include(x => x.MainRole)
    .Include(x => x.Flags)
        .ThenInclude(x => x.Flag)
    .Include(x => x.ManagerFlags)
        .ThenInclude(x => x.Flag)
    .Include(x => x.Technologies
        .Where(x => !x.NotInterested))
        .ThenInclude(x => x.Technology)
    .Include(x => x.Activities
        .Where(x => x.Visibility == ActivityVisibility.Public))
    .Include(x => x.Age)
    .AsNoTracking()
    .ToList();

The same exception occurs with Concat() instead of union, and is not happening when I remove the Union and return one query or another

2021-11-15 22:19:32.1843|ERROR|ExceptionHandlerMiddleware|Exception: System.InvalidOperationException: Unable to translate a collection subquery in a projection since either parent or the subquery doesn't project necessary information required to uniquely identify it and correctly generate results on the client side. This can happen when trying to correlate on keyless entity type. This can also happen for some cases of projection before 'Distinct' or some shapes of grouping key in case of 'GroupBy'. These should either contain all key properties of the entity that the operation is applied on, or only contain simple property access expressions.
   at Microsoft.EntityFrameworkCore.Query.SqlExpressions.SelectExpression.ApplyProjection(Expression shaperExpression, ResultCardinality resultCardinality, QuerySplittingBehavior querySplittingBehavior)
   at Microsoft.EntityFrameworkCore.Query.Internal.SelectExpressionProjectionApplyingExpressionVisitor.VisitExtension(Expression extensionExpression)
   at System.Linq.Expressions.Expression.Accept(ExpressionVisitor visitor)
   at System.Linq.Expressions.ExpressionVisitor.Visit(Expression node)
   at Microsoft.EntityFrameworkCore.Query.RelationalQueryTranslationPostprocessor.Process(Expression query)
   at Microsoft.EntityFrameworkCore.Query.QueryCompilationContext.CreateQueryExecutor[TResult](Expression query)
   at Microsoft.EntityFrameworkCore.Storage.Database.CompileQuery[TResult](Expression query, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.CompileQueryCore[TResult](IDatabase database, Expression query, IModel model, Boolean async)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.<>c__DisplayClass12_0`1.<ExecuteAsync>b__0()
   at Microsoft.EntityFrameworkCore.Query.Internal.CompiledQueryCache.GetOrAddQuery[TResult](Object cacheKey, Func`1 compiler)
   at Microsoft.EntityFrameworkCore.Query.Internal.QueryCompiler.ExecuteAsync[TResult](Expression query, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryProvider.ExecuteAsync[TResult](Expression expression, CancellationToken cancellationToken)
   at Microsoft.EntityFrameworkCore.Query.Internal.EntityQueryable`1.GetAsyncEnumerator(CancellationToken cancellationToken)
   at System.Runtime.CompilerServices.ConfiguredCancelableAsyncEnumerable`1.GetAsyncEnumerator()
   at Microsoft.EntityFrameworkCore.EntityFrameworkQueryableExtensions.ToListAsync[TSource](IQueryable`1 source, CancellationToken cancellationToken)
   at x.HR.API.Controllers.CandidatesController.List(String filter, String sortBy, String sortDirection, Int32 page, Int32 pageSize) in /Users/x/src/x-hr-backend/x.HR.API/Controllers/CandidatesController.cs:line 102
   at lambda_method94(Closure , Object )
   at Microsoft.AspNetCore.Mvc.Infrastructure.ActionMethodExecutor.AwaitableObjectResultExecutor.Execute(IActionResultTypeMapper mapper, ObjectMethodExecutor executor, Object controller, Object[] arguments)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeActionMethodAsync>g__Awaited|12_0(ControllerActionInvoker invoker, ValueTask`1 actionResultValueTask)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.<InvokeNextActionFilterAsync>g__Awaited|10_0(ControllerActionInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Rethrow(ActionExecutedContextSealed context)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.Next(State& next, Scope& scope, Object& state, Boolean& isCompleted)
   at Microsoft.AspNetCore.Mvc.Infrastructure.ControllerActionInvoker.InvokeInnerFilterAsync()
--- End of stack trace from previous location ---
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeFilterPipelineAsync>g__Awaited|20_0(ResourceInvoker invoker, Task lastTask, State next, Scope scope, Object state, Boolean isCompleted)
2
   at Microsoft.AspNetCore.Mvc.Infrastructure.ResourceInvoker.<InvokeAsync>g__Awaited|17_0(ResourceInvoker invoker, Task task, IDisposable scope)
   at Microsoft.AspNetCore.Routing.EndpointMiddleware.<Invoke>g__AwaitRequestTask|6_0(Endpoint endpoint, Task requestTask, ILogger logger)
   at Microsoft.AspNetCore.Localization.RequestLocalizationMiddleware.Invoke(HttpContext context)
   at x.HR.API.Infrastructure.Middlewares.Cache.NoCacheMiddleware.Invoke(HttpContext httpContext) in /Users/x/src/x-hr-backend/x.HR.API/Infrastructure/Middlewares/Cache/NoCacheMiddleware.cs:line 25
   at Microsoft.AspNetCore.Authorization.Policy.AuthorizationMiddlewareResultHandler.HandleAsync(RequestDelegate next, HttpContext context, AuthorizationPolicy policy, PolicyAuthorizationResult authorizeResult)
   at Microsoft.AspNetCore.Authorization.AuthorizationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Authentication.AuthenticationMiddleware.Invoke(HttpContext context)
   at Microsoft.AspNetCore.Diagnostics.ExceptionHandlerMiddleware.<Invoke>g__Awaited|6_0(ExceptionHandlerMiddleware middleware, HttpContext context, Task task)
Url: /api/candidates
Query String: ?page=0&pageSize=30
Method: GET

EF Core version: 6.0.0 Database provider: Microsoft.EntityFrameworkCore.SqlServer Target framework: .NET 6.0 Operating system: MacOs IDE: VS Code

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 21 (7 by maintainers)

Most upvoted comments

@StevenRasmussen It depends how the queries are created. For example, this is fine:

var filteredBlogs = context.Blogs.Where(e => e.Id == 1);
var blogs = filteredBlogs.Union(context.Blogs).Include(e => e.Posts).ToList();

But this is not:

var filteredBlogs = context.Blogs.Where(e => e.Id == 1).Select(e => new Blog { Id = e.Id });
var blogs = filteredBlogs.Union(context.Blogs).Include(e => e.Posts).ToList();

Full repro code:

public class Blog
{
    public int Id { get; set; }
    public int SomeOtherId { get; set; }
    public string Name { get; set; }

    public List<Post> Posts { get; } = new();
}

public class Post
{
    public int Id { get; set; }
    public string Title { get; set; }
    public string Content { get; set; }

    public Blog Blog { get; set; }
}

public class SomeDbContext : DbContext
{
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
        => optionsBuilder
            .UseSqlServer(Your.ConnectionString)
            .LogTo(Console.WriteLine, LogLevel.Information)
            .EnableSensitiveDataLogging();

    public DbSet<Blog> Blogs { get; set; }
    public DbSet<Post> Posts { get; set; }
}

public class Program
{
    public static void Main()
    {
        using (var context = new SomeDbContext())
        {
            context.Database.EnsureDeleted();
            context.Database.EnsureCreated();

            context.Add(new Blog {Posts = {new Post(), new Post()}});
            context.Add(new Blog {Posts = {new Post(), new Post(), new Post()}});
            context.SaveChanges();
        }

        using(var context = new SomeDbContext())
        {
            // var filteredBlogs = context.Blogs.Where(e => e.Id == 1).Select(e => new Blog { Id = e.Id });
            // var blogs = filteredBlogs.Union(context.Blogs).Include(e => e.Posts).ToList();

            var filteredBlogs = context.Blogs.Where(e => e.Id == 1);
            var blogs = filteredBlogs.Union(context.Blogs).Include(e => e.Posts).ToList();

            foreach (var blog in context.Blogs)
            {
                Console.WriteLine($"Blog {blog.Id} with {blog.Posts.Count} posts.");
            }
        }
    }
}

In EF Core, in order to do collection include, it requires each record of the parent (on which the collection will be populated) uniquely identifiable. Since then we can fetch related records and fix up the navigations properly. This is by design and cannot be changed else it can cause duplicated records in the collection/improperly filled collection/collection not marked as loaded. The concept of uniquely identifiable is at row level which means we do track origin, and even if the parent entity is duplicated, if the row have unique identifier we still work with above principal. An example would be Orders.Select(o => o.Customer), since each customer can multiple orders, the result set will have duplicated customer entities but together with PK of Orders, each row in the result is still unique (OrderId, CustomerId together creates a composite unique key).

With above things in mind

  • Concat will never work since it doesn’t remove duplicate records. Union works since it removes duplicates between both result sets.
  • When you do join it can give rise to duplicate results when the cardinality is more than one.
from c in Customers
join o in Orders on c.CustomerId equals o.CustomerId
select c

Above query will have duplicated customers in the result. By tracking origin, EF Core will also include OrderId in unique identifier.

But once you apply set operation with above query to something else which has different origin, there is no way to uniquely identify rows anymore. So we cannot do collection include. This is something which cannot be changed unless we come up with altogether different idea in how to do collection include.

  • If the set operation is not doing any paging afterwards then the only difference to server side is one round trip vs two. Split the query to do set operation on client side.
  • If set operation is required to be performed on server then don’t apply collection include to them. Rather, load them explicitly after fetching the results of set operation from server.
  • Use FromSql* methods to construct the result of set operation (this put set operation on server side) and then do collection include. We assume that results coming out of FromSql* is going to be unique on the entity type it returns. You can do this only if the set operation you want to perform is not giving rise to duplicate entity results.

@smitpatel Simplified query. Does not fail if Include is removed:

var asignedCandidates =
    from cs in context.Candidates
    join an in context.CandidateAssignations on cs.Id equals an.AssignedId
    select cs;

var result = asignedCandidates
    .Union(context.Candidates)
    .Include(x => x.Technologies)
    .ToList();

I have a similar but more complex query and it works only if “include” are just before executing (ToList()).

@smitpatel Simplified query. Does not fail if Include is removed:

var asignedCandidates =
    from cs in context.Candidates
    join an in context.CandidateAssignations on cs.Id equals an.AssignedId
    select cs;

var result = asignedCandidates
    .Union(context.Candidates)
    .Include(x => x.Technologies)
    .ToList();