efcore: 20x slowdown in gigantic query after updating to EF Core 3 compared to 2.2
Hi, it seems as though this query is taking 20x longer now. I can’t figure it out.
I opened the sql server profiler and it looks as though in 2.2 it splits it in to multiple chunks, e.g get subjects as one query, get certificates as another.
However, in 3.0 it is executing it on sql server as just one query.
The purpose of this query is to load all basic information for the websites dashboard. It’s ugly, but gets the job done.
It goes from about 1 second to 20 seconds.
Can anyone let me know how to structure it so EF knows to split it in to sub queries as it did prior to updating to 3.0?
Thanks
Client client = await _db.Users.Where(x => x.Id == userId).Select(x => x.Client).FirstAsync();
return await _db.Clients
.Include(x => x.Subjects).ThenInclude(x => x.Subject)
.Include(x => x.Certificates).ThenInclude(x => x.Certificate)
.Include(x => x.Locations).ThenInclude(x => x.LocationContacts).ThenInclude(x => x.Contact).ThenInclude(x => x.Certificates).ThenInclude(x => x.Certificate)
.Include(x => x.Locations).ThenInclude(x => x.LocationContacts).ThenInclude(x => x.Contact).ThenInclude(x => x.Subjects).ThenInclude(x => x.Subject)
.Include(x => x.Locations).ThenInclude(x => x.Bookings).ThenInclude(x => x.Covering)
.Include(x => x.Locations).ThenInclude(x => x.Bookings).ThenInclude(x => x.Invitations)
.Include(x => x.Locations).ThenInclude(x => x.Bookings).ThenInclude(x => x.Certificates)
.Include(x => x.Locations).ThenInclude(x => x.Bookings).ThenInclude(x => x.Subjects)
.Include(x => x.Locations).ThenInclude(x => x.Bookings).ThenInclude(x => x.BookingAgencies).ThenInclude(x => x.Agency)
.Include(x => x.Agencies).ThenInclude(x => x.Agency).ThenInclude(x => x.Webhooks)
.Where(x => x.Id == client.Id)
.FirstAsync();
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 7
- Comments: 24 (3 by maintainers)
This is exactly why generating a single SQL statement is bad. This is why I’ve had to fight and fix so many performance issues over the years in EF6. This new design of generating a single SQL Statement is a big regression from performance perspective by EF Team.
In EF Core 3 you’ll need to manually split the load into reasonable parts. The best rule of the thumb is to load all collections 1:N separately and include only 1:1, N:1 relations in a load.
For EF6 I’ve written some utilities for graph loading (That I’ve previously detailed here: https://github.com/aspnet/EntityFrameworkCore/issues/1833#issuecomment-262741637 ). Looks like the helper lib will go into fashion again, Time to put it in a library and push it to NuGet.
Can you force EF to return the relevant results over multiple queries, and rely on EF to stitch the objects together? Here is an example:
And if you are using mapping tables for many-to-many joins (rather than letting EF create the mapping table), this code might work even better: (assuming EF patches up the relational properties automatically)
And if that works, this might work even better: (eliminating an inner join on clients)
I have not yet tested it, but I believe it should work.
@Shane32’s example is perfect way to rewrite the query. He said is right, split collection navigation include. If you believe the cardinality of 1:N relationalship is really low, you can include it in single query but otherwise splitting it out could be useful.
At the end of the day, there is cost of cartesian product by fetching more data from server & there is cost of running multiple roundtrips to database. Best thing to do is profile your application, see where splitting the query helps you and rewrite the query where it does.
I am closing this issue as duplicate of #18022 since it has more details.
My query went to 1.5 mins. After rewriting to include the separate queries as per Shane32’s post the query went down to 30 seconds which was still way too slow.
My object graph included owned collections, and I realised those were the cause of part of the slowdown as additional objects in the collection would exponentially slow down the query. After redesigning my graph such that the collections were no longer owned, the query went further down to just 300ms for even the largest graphs. I am not 100% sure this was caused by the EF core 3.0 rewrite which is the topic of this thread (and I don’t have time anymore to investigate further) however if someone is troubleshooting performance, owned collections could be suspect.
Ok, so it seems EF likely does a round trip per query. In that case, one query per linq seems sensible, however my only concern is that when our DTO maps exactly to what EF outputs, we don’t need to do mapping. In this case, when we do the multiple queries ourselves, we would need to stitch the results together in to our DTO, thus requiring manual mapping and removing a useful part of having an ORM.
This is unresponsible to make that kind of change without giving heads up with a big red warning. It took me a few hours to upgrade to 3.1 and eventually I got sql time out exception because of the use of 5 “Include”. I had to revert to 2.2 because I won’t change all the queries in my app to small queries.
This might work:
@KevinMallinson - That was old behavior. Some code snippet in the middle of a discussion is not guaranteed to work in final version of a product. Work-around would be to write multiple linq queries which generates corresponding SQL similar to how previous version did. How much granular you need to go depends on your data. The “cartesian product” has good performance for certain dataset and bad for other. #12098 contains detailed discussion about it and why we decided that single query was better default.
@offirpeer I commented on your post on Stack Overflow and I just wanted to say that I agree with you 100%. I, too, spent all of yesterday upgrading to .NET Core 3.1, only to find out that the EF Core team introduced this massive change and hid it under the same
.Include(...)API. In my opinion, it’s certainly a breaking change in disguise, as it has stopped me from going forward with my migration to 3.1.@offirpeer That’s pretty much why I do most development in EF6 now that it’s available on .NET Core 3.1. It’s a lot more stable and the fact Microsoft is putting less effort into it means there is less likelihood of breaking changes that can cost you time to rewrite. I am hoping EFCore 5.0 is a game changer (and based on Andew’s weekly updates tracker, it may be), but I suspect 5.0+x LTS release will be the one that will make me stop using EF6. I think it might then take me about 2-3 years to move everything from EF6 to EFCore 5.0+x LTS release.