efcore: Slow compilation when using many .ThenInclude()'s

In EFCore3 Preview 7 onwards, there is a significant slow down when I have a query with many .ThenInclude()s, e.g.

new WhyYouSoSlowDb(options)
        .Blog
        .Include(x => x.Posts)
          .ThenInclude(x => x.PostInstances)
          .ThenInclude(x => x.Comments)
          .ThenInclude(x => x.AcquiredComments)
          .ThenInclude(x => x.Tag_AcquiredComments)
          .ThenInclude(x => x.Tag)
        .FirstOrDefault();

The above takes 20 seconds to run on an empty database. On EFCore3 Preview 5, it takes 1.5 seconds to run. (Preview 6 throws an exception.)

Steps to reproduce

Repo available here. Note that some amount of complexity is required to reproduce this issue - if you delete the 5 unreferenced properties from AcquiredCommentEntity, it will take 10 seconds to run.

Further technical details

EF Core version: EFCore3 Preview 7 and 8 Database Provider: Microsoft.EntityFrameworkCore.SqlServer Operating system: Windows 10 IDE: Visual Studio 2019 Preview

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 30 (19 by maintainers)

Commits related to this issue

Most upvoted comments

@eliasbobadilla take a look at this docs page to understand single vs. split queries. You may want to make that query use split queries, depending on how many Tools there typically are for each Toolkit (and also how many Toolkits for each Application).

thanks a lot @roji , I solved my problem and now I know how to better approach queries with multiple join in sql with Ef

@roji Thanks for the clarification. I noticed the compilation is faster (my integration tests and app looked hung before the query actually fired prior to your fix). I had a feeling in the back of my mind this was related to the significant change you mentioned, but I wasn’t entirely sure yet if this issue’s resolution would help.

I think you are correct though, I noticed the SQL being emitted by EF with 2.2.6 is being broken up into small queries versus 3.0 where it’s one gargantuan query of joins.

Will this be fixed for 3.0 release? Because I run projects that do 20-30 includes and if that is NOT fixed then basically the performance regression makes this update a total no go. I can not move queries from significantly sub 1 second EFCore handling to MINUTES.

Also re SQL: not meeting the bar is run - up to the point people actually have to DEBUG complex expression trees and the SQL looks like a nightmare compared to the clean SQL that was usus earlier.

When both of them points to same instance, is the case when SelectExpression did not re-generate. Sure, it is not that bug and it would still cause performance issue. But the only way you can keep same instance after re-generation is not visit the same SelectExpression multiple times, (no matter where it appears so it can expand more than projection too). And that solves the perf issue already. #17337 already fixes perf issue and I don’t see a reason to track it separately.

Few observations:

  • Query compilations takes about 0.5 secs in preview5, 14 secs in preview8 & 20 secs in daily. Above mentioned figures are for cold query and not for cached entry.
  • Runtimes are 0s, 4s, 4s respectively. Generated SQLs: Preview5
SELECT TOP(1) [x].[Id], [x].[Name], [x].[UserId]
FROM [Blog] AS [x]
ORDER BY [x].[Id]

Preview8+

SELECT [t].[Id], [t].[Name], [t].[UserId], [t6].[Id], [t6].[BlogId], [t6].[Description], [t6].[UserId], [t6].[Id0], [t6].[Created], [t6].[Hash], [t6].[IsDeleted], [t6].[Modified], [t6].[PostId], [t6].[Id00], [t6].[Index], [t6].[PostInstanceId], [t6].[TemplateId], [t6].[UserId0], [t6].[CommentId], [t6].[CommentState], [t6].[Created0], [t6].[IsDeleted0], [t6].[Meta], [t6].[Meta2], [t6].[MetaId], [t6].[TagId], [t6].[UserId00], [t6].[CommentId0], [t6].[Id000], [t6].[Name], [t6].[UserId000]
FROM (
    SELECT TOP(1) [b].[Id], [b].[Name], [b].[UserId]
    FROM [Blog] AS [b]
) AS [t]
LEFT JOIN (
    SELECT [p].[Id], [p].[BlogId], [p].[Description], [p].[UserId], [t5].[Id] AS [Id0], [t5].[Created], [t5].[Hash], [t5].[IsDeleted], [t5].[Modified], [t5].[PostId], [t5].[Id0] AS [Id00], [t5].[Index], [t5].[PostInstanceId], [t5].[TemplateId], [t5].[UserId] AS [UserId0], [t5].[CommentId], [t5].[CommentState], [t5].[Created0], [t5].[IsDeleted0], [t5].[Meta], [t5].[Meta2], [t5].[MetaId], [t5].[TagId], [t5].[UserId0] AS [UserId00], [t5].[CommentId0], [t5].[Id00] AS [Id000], [t5].[Name], [t5].[UserId00] AS [UserId000]
    FROM [Post] AS [p]
    LEFT JOIN (
        SELECT [p0].[Id], [p0].[Created], [p0].[Hash], [p0].[IsDeleted], [p0].[Modified], [p0].[PostId], [t4].[Id] AS [Id0], [t4].[Index], [t4].[PostInstanceId], [t4].[TemplateId], [t4].[UserId], [t4].[CommentId], [t4].[CommentState], [t4].[Created] AS [Created0], [t4].[IsDeleted] AS [IsDeleted0], [t4].[Meta], [t4].[Meta2], [t4].[MetaId], [t4].[TagId], [t4].[UserId0], [t4].[CommentId0], [t4].[Id0] AS [Id00], [t4].[Name], [t4].[UserId00]
        FROM [PostInstance] AS [p0]
        LEFT JOIN (
            SELECT [c].[Id], [c].[Index], [c].[PostInstanceId], [c].[TemplateId], [t3].[UserId], [t3].[CommentId], [t3].[CommentState], [t3].[Created], [t3].[IsDeleted], [t3].[Meta], [t3].[Meta2], [t3].[MetaId], [t3].[TagId], [t3].[UserId0], [t3].[CommentId0], [t3].[Id] AS [Id0], [t3].[Name], [t3].[UserId00]
            FROM [Comment] AS [c]
            LEFT JOIN (
                SELECT [a].[UserId], [a].[CommentId], [a].[CommentState], [a].[Created], [a].[IsDeleted], [a].[Meta], [a].[Meta2], [a].[MetaId], [t2].[TagId], [t2].[UserId] AS [UserId0], [t2].[CommentId] AS [CommentId0], [t2].[Id], [t2].[Name], [t2].[UserId0] AS [UserId00]
                FROM [AcquiredComment] AS [a]
                LEFT JOIN (
                    SELECT [t0].[TagId], [t0].[UserId], [t0].[CommentId], [t1].[Id], [t1].[Name], [t1].[UserId] AS [UserId0]
                    FROM [Tag_AcquiredComment] AS [t0]
                    INNER JOIN [Tag] AS [t1] ON [t0].[TagId] = [t1].[Id]
                ) AS [t2] ON ([a].[UserId] = [t2].[UserId]) AND ([a].[CommentId] = [t2].[CommentId])
            ) AS [t3] ON [c].[Id] = [t3].[CommentId]
        ) AS [t4] ON [p0].[Id] = [t4].[PostInstanceId]
    ) AS [t5] ON [p].[Id] = [t5].[PostId]
) AS [t6] ON [t].[Id] = [t6].[BlogId]
ORDER BY [t].[Id], [t6].[Id], [t6].[Id0], [t6].[Id00], [t6].[UserId0], [t6].[CommentId], [t6].[TagId], [t6].[UserId00], [t6].[CommentId0], [t6].[Id000]

The 2nd query is 7x time slower in SqlServer. (according to analyze plan).

  • We do not have caching of generated SelectExpression yet. Given size of SQL query, it could easily contribute to time spent. See #15892
  • The difference in generated SQL comes from single query design.
  • Looking at code, if data reader is empty, we return immediately so nothing specific to shaper is affecting it.

My belief is #15892 will get us as close as we can get. Apart from that, there is no extra work we are doing which is not by design.