efcore: Includes are unpredictably ignored

@halosghost and I have been working on a project together and have found it increasingly difficult to guarantee some of the navigational properties in our queries. When I first read the documentation on ignoring includes (https://docs.microsoft.com/en-us/ef/core/querying/related-data#ignored-includes), I thought that only it was applicable when lazy loading was being used. I was quite surprised to learn that it wasn’t.

It’s unreasonable to say that .include() can be ignored just because it isn’t referenced before it is ToListed.

I searched past issues but couldn’t find the reasoning behind why .include()s can be ignored. Can you either shed some light on this or make a way to explicitly force the includes to be used? Without being sure that a navigational property is eagerly loaded, it is very hard to keep our project running correctly.

Steps to reproduce

Exception / Warning message:
> The Include operation for navigation '[x].Distributor.LevelTypeLookup' is unnecessary and was ignored because the navigation is not reachable in the final query results. See https://go.microsoft.com/fwlink/?linkid=850303 for more information.

By running the code below, the exception/warning above is logged during runtime, and the LevelTypeLookup navigational property on the Distributor is null.

var tree = _services.GetService<AdHocService>().GetTreeNodes("123456")
    .Select(x => x.Distributor)
    .ToList();

var level = tree.FirstOrDefault()?.LevelTypeLookup?.Name;

Here’s the resulting query:

SELECT [x.Distributor].[Id], [x.Distributor].[CompanyName], [x.Distributor].[ContractTypeLookupId], [x.Distributor].[CreatedBy], [x.Distributor].[CreatedOn], [x.Distributor].[Cycle], [x.Distributor].[EnrollmentTypeLookupId], [x.Distributor].[FamilyTypeLookupId], [x.Distributor].[IsBusiness], [x.Distributor].[LevelTypeLookupId], [x.Distributor].[ModifiedBy], [x.Distributor].[ModifiedOn], [x.Distributor].[SponsorId], [x.Distributor].[SpousalAgreement], [x.Distributor].[StartDate], [x.Distributor].[StatusTypeLookupId]
FROM (
    select * from dbo.GetTreeNodes(@p0, @p1)
) AS [x]
LEFT JOIN [Distributor] AS [x.Distributor] ON [x].[DistributorId] = [x.Distributor].[Id]

I was surprised to find that when the .ToList() is moved before the .Select(), then the LevelTypeLookup property is retrieved and accessible.

 var tree = _services.GetService<AdHocService>().GetTreeNodes("123456")
+     .ToList()
+     .Select(x => x.Distributor);
-     .Select(x => x.Distributor)
-     .ToList();

Method in my repository:

public IQueryable<TreeNode> GetTreeNodes(string distributorId, DateTime? asOf = null) =>
    _repo.TreeNodes
        .Include(x => x.Distributor)
        .ThenInclude(x => x.LevelTypeLookup)
        .FromSql("select * from dbo.GetTreeNodes({0}, {1})", distributorId, asOf ?? DateTime.Now);

The models:

    public class TreeNode
    {
        public string DistributorId { get; set; }
        public string ParentId { get; set; }
        public int NumberOfChildren { get; set; }

        public virtual Distributor Distributor { get; set; }
    }

    public class Distributor
    {
        public string Id { get; set; }
        public int LevelTypeLookupId { get; set; }
        public string CompanyName { get; set; }

        [ForeignKey("LevelTypeLookupId")]
        public virtual LevelTypeLookup LevelTypeLookup { get; set; }
    }

    public class LevelTypeLookup
    {
        public int Id { get; set; }
        public string Abbreviation { get; set; }
        public string Name { get; set; }
    }

within the DbContext:

        ...
        public virtual DbQuery<TreeNode> TreeNodes { get; set; }

        protected override void OnModelCreating(ModelBuilder modelBuilder)
        {
            base.OnModelCreating(modelBuilder);
            new TreeNodeConfig(modelBuilder);
        }

TreeNodeConfig.cs:

    public class TreeNodeConfig
    {
        public TreeNodeConfig(ModelBuilder modelBuilder)
        {
            modelBuilder.Query<TreeNode>()
                .Property(e => e.DistributorId)
                .HasColumnName("DistributorId");

            modelBuilder.Query<TreeNode>()
                .Property(e => e.ParentId)
                .HasColumnName("ManagerId");

            modelBuilder.Query<TreeNode>()
                .Property(e => e.NumberOfChildren)
                .HasColumnName("NumberOfChildren");
        }
    }

Further technical details

EF Core version: 2.1.0 Database Provider: Microsoft.EntityFrameworkCore.SqlServer Operating system: Windows 10 IDE: Visual Studio 2017 15.7.4

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 19 (9 by maintainers)

Most upvoted comments

@HalosGhost and others. I’m going to try to explain this based on our discussion. Consider this model:

public class Blog
{
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int Id { get; set; }

    public virtual ICollection<Post> OldPosts { get; set; }
    public virtual ICollection<Post> FeaturedPosts { get; set; }
}

public class Post
{
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int Id { get; set; }

    public Author Author { get; set; }
}

public class Author
{
    [DatabaseGenerated(DatabaseGeneratedOption.None)]
    public int Id { get; set; }
}

Notice that there are two ways to get from a blog to a post–the OldPosts relationship and the FeaturedPosts relationship. Let’s populate this with model with some data:

using (var context = new BloggingContext())
{
    context.Database.EnsureDeleted();
    context.Database.EnsureCreated();

    var post1 = new Post { Id = 1, Author = new Author { Id = 1 } };
    var post2 = new Post { Id = 2, Author = new Author { Id = 2 } };
    var post3 = new Post { Id = 3, Author = new Author { Id = 3 } };
    var post4 = new Post { Id = 4, Author = new Author { Id = 4 } };

    context.Add(
        new Blog
        {
            Id = 1,
            OldPosts = new List<Post> { post1, post2 },
            FeaturedPosts = new List<Post> { post3, post4 }
        });
        
    context.SaveChanges();
}

Notice that some posts are related to blogs through one relationship, some through the other. Now lets do a simple query with Include:

using (var context = new BloggingContext())
{
    var blog = context.Blogs
        .Include(e => e.FeaturedPosts).ThenInclude(e => e.Author)
        .Include(e => e.OldPosts)
        .First();

    foreach (var post in blog.OldPosts)
    {
        Console.WriteLine($"Old post: {post.Id} Author: {post.Author?.Id}");
    }

    foreach (var post in blog.FeaturedPosts)
    {
        Console.WriteLine($"Featured post: {post.Id} Author: {post.Author?.Id}");
    }
}

The output from this on my machine is:

Old post: 1 Author:
Old post: 2 Author:
Featured post: 3 Author: 3
Featured post: 4 Author: 4

Notice that both posts accessed through the FeaturedPosts relationship have the author included. This is because the Include for FeaturedPosts also has a ThenInclude for Author: .Include(e => e.FeaturedPosts).ThenInclude(e => e.Author)

However, neither post accessed only through the OldPosts relationship have the author included, since the Include for this relationship does not have an associated ThenInclude for Author: .Include(e => e.OldPosts)

Hopefully all that is quite clear. 😄

So what happens if we change the shape of the query with a projection? Consider for example:

using (var context = new BloggingContext())
{
    var oldPosts = context.Blogs
        .Include(e => e.FeaturedPosts).ThenInclude(e => e.Author)
        .Include(e => e.OldPosts)
        .SelectMany(e => e.OldPosts)
        .ToList();

    var featuredPosts = context.Blogs
        .Include(e => e.FeaturedPosts).ThenInclude(e => e.Author)
        .Include(e => e.OldPosts)
        .SelectMany(e => e.FeaturedPosts)
        .ToList();

    foreach (var post in oldPosts)
    {
        Console.WriteLine($"Post: {post.Id} Author: {post.Author?.Id}");
    }

    foreach (var post in featuredPosts)
    {
        Console.WriteLine($"Featured post: {post.Id} Author: {post.Author?.Id}");
    }
}

As described above, the Include is dropped, so the output on my machine is:

Old post: 1 Author:
Old post: 2 Author:
Featured post: 3 Author: 
Featured post: 4 Author:

That is, no authors have been loaded because the Include was ignored.

Now, a simplistic suggestion for not ignoring Include is to say something like, “If an entity type in the final projection is also contained in some Include path, then use the remainder of that Include path.” Doing so in this case would mean that since Post is in the final projection, and Post is in one of the Include paths, then Author should be included in the results because it exists as a ThenInclude on Post.

However, there are two issues with this. First, if the type is used in multiple Include paths, then which one should be used? Ignoring that for the moment, and looking at the output if we did this:

Old post: 1 Author: 1
Old post: 2 Author: 2
Featured post: 3 Author: 3
Featured post: 4 Author: 4

Why, in this case, are authors being Included for old posts? This is wrong–there was never a ThenInclude for the OldPosts relationship, so they should not have Author included. This is the second and more significant issue.

But what if we use information about the projection to drive this? For example, looking again at this:

var oldPosts = context.Blogs
    .Include(e => e.FeaturedPosts).ThenInclude(e => e.Author)
    .Include(e => e.OldPosts)
    .SelectMany(e => e.OldPosts)
    .ToList();

we are selecting OldPosts, in the projection, so we could use this to determine which Include path to use–in this case, the one that does not also include Author. Likewise, for this query:

var featuredPosts = context.Blogs
    .Include(e => e.FeaturedPosts).ThenInclude(e => e.Author)
    .Include(e => e.OldPosts)
    .SelectMany(e => e.FeaturedPosts)
    .ToList();

we would use the other Include path, which would include Author.

This would likely work for all cases except for entirely uncorrelated types in the projection. However, the implementation is very complicated, and as the length of this post shows, the mental model for figuring out what is going on is also complicated.

This is why, rather than attempting to implement something like this, we believe a better approach is as described in #2953. That is, if a relationship should always be included when an entity of a given type is loaded, regardless of where it came from, then that should be specified in a different API that allows such rules to be expressed independently of the shape of the query.

ajcvickers on Jun 27, 2018