efcore: Query with GroupBy or GroupJoin throws exception
If you run into issue with GroupBy/GroupJoin and arrive here, then please take time to read https://github.com/dotnet/efcore/issues/17068#issuecomment-586464350 fully to direct your feedback in specific issue for your specific scenario.
Read our documentation about this here
Group By
Linq GroupBy
operator allows to create IGrouping<TKey, TElement>
where TKey
& TElement
could be arbitrary types. Further, IGrouping
implements IEnumerable<TElement>
which means you can compose over using any queryable operator after grouping.
In contrast SQL GROUP BY
keyword is very restrictive. In SQL, you can only use scalar values in GROUP BY
. Most database also allows only referencing grouping key columns or aggregate applied over any other column. Hence selecting a column which is not in key or does not have aggregate applied is not possible.
Due to this mismatch, not all linq GroupBy
can be translated to SQL GROUP BY
. In EF Core 3.0 we have support for SQL GROUP BY
but not for all linq GroupBy
. There are few bugs/limitation in this translation for which we have various issues filed, mainly for scenarios where Distinct or a predicate is applied before aggregate operation.
Another GroupBy scenario which can be translated to server is selecting first element of each group which is being tracked by #13805
Most other linq GroupBy
cannot be evaluated on server and must be evaluated on client side. This issue tracks about allowing EF Core to do this client side grouping implicitly. Absence of this feature means users would have to manually call AsEnumerable
before GroupBy to explicitly falling into client side grouping.
GroupJoin
Linq GroupJoin
is a queryable operator which does not have equivalent SQL keyword. The biggest use of GroupJoin for an ORM is to generate Left join on server side as mentioned here. Customers.GroupJoin(Orders,...)
would create IEnumerable<Order>
for each Customer
. As with GroupBy, you can compose over this in Linq but there is no translation. Again, GroupJoin requires client side grouping to generate correct result. This issue also tracks adding implicit support for it.
In recent discussion it came up that, allowing this may be pit of failure (with escape hatch at the bottom :trollface: ).
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 22
- Comments: 42 (24 by maintainers)
We discussed this issue in a team meeting in detailed and I have read all above comments thoroughly. Following are our conclusions GroupBy operator As listed in this issue earlier following are missing translations in GroupBy operator which we will add in future and we will not client eval any of those patterns.
GroupJoin operator GroupJoin is mainly used to generate LeftJoin and there is no SQL equivalent. Hence we will not translate any composition over GroupJoin.
GroupJoin <-> GroupBy equivalence Due to high cost of pattern matching GroupJoin operator, we are not going to add any translation in GroupJoin even if equivalent GroupBy query would be translated. Customers would be required use GroupBy syntax as that is closest representation of what is the SQL.
AsAsyncEnumerable usage As our exception message says, when client evaluating use AsAsyncEnumerable operator. We understand that after getting IAsyncEnumerable, composing over is not easy. Since Enumerable operators which works on IEnumerable are defined in .NET runtime, we believe that most beneficial thing for all the customers of .NET would be to have equivalent operators which works on IAsyncEnumerable in .NET runtime only. If you need async enumerable operators in your query, please file an issue in https://github.com/dotnet/runtime In the meantime, you can System.Interactive.Async - an external library, which allows enumerable like operators over IAsyncEnumerable for composition. Since this issue is much broader than EF Core and there is a reasonable open source work-around, we are not making any changes to EF Core regarding this.
WithClientEvaluation operator
With all of above, I believe that it addresses all the points raised in discussion here. If you run into issue with GroupBy/GroupJoin and arrive here, then please take time to read this comment fully to direct your feedback in specific issue for your specific scenario.
Closing this as by-design because there is no reasonable or valuable translation to the server we can do in these cases.
@smitpatel
Regarding
GroupJoin
:I’m quite surprised. Never thought the
GroupJoin
key selectors would be the reason for not supporting it. But you do supportJoin
which should have exactly the same issues you’ve mentioned with nulls in keys. You do even support the “official” LINQ left outer join pattern which containsGroupJoin
.To recap, from the 3 possible join based LINQ patterns
(1) JOIN
(2) LEFT JOIN
(3)
for some reason you don’t want to support the third one.
All they have the same potential key comparison issue. And all they can be rewritten to use correlated subquery rather than join. I guess you see the point.
But please consider supporting the last one rather than removing the first two 😃
Regarding
GroupBy
:You definitely need to add support or provide alternative solution for the common “N items per group” problem. You’ve added a very nice
row_number over
support which solves it for correlated subqueries, but people often need that afterGroupBy
, especially the top 1 (last) item of group. See for instance the following SO post Problem with EF OrderBy after migration to .net core 3.1Regarding client evaluation:
Switching to explicit client evaluation of code which used 2.x Async API is not easy - see the following SO post Converting EF Core queries from 2.2 to 3.0 - async await
I think EF Core should probably provide its own way to switch to client evaluation (other than
AsEnumerable()
orAsAsyncEnumerable()
) which keeps the resultIQueryable>T>
, but somehow “switches” the context from there. Just a thought.AsEnumerable()
is good and was what we are suggesting EF6 people who use synchronous LINQ, but nowadays many people use “modern” frameworks and Async APIs. You gave them async APIs forQueryable
and you can’t simple take it back without providing a good alternative (I mean, you can of course, but that’s not good).Currently I’m the biggest EF Core supporter on SO and trust me, some 3.x decisions like this really created a lot of problems/frustration for people coming from EF6 or EF Core 2.x, and for sure are not good for EF Core future. People are start seeking/promoting alternative frameworks, which at the end won’t be good for anyone who invested time/efforts there.
@smitpatel (and @roji with thumbs up):
The value is exactly the same as for
Any(predicate)
vsWhere(predicate).Any()
Sum(expr)
vsSelect(expr).Sum()
i.e.
vs
which you already DO support, so I guess you saw a value in them, even though they fall into the same category as the one in discussion.
@powermetal63 - Is there any additional value to support different LINQ query which translates to same thing at the expense of translating lesser things in general? My point is, when you can write query in form 1, why write in second form. Plus confusion about 2nd form, is when it works and when it does not. (Form 1 does not face that issue)
Hitting lots of errors with this after upgrading from 2.2 to 3.1. Reverting the changes until this is sorted out. In some of our scenarios we have collections filtered and having the ToDictionary run is not an issue at all. Results from some of these are also cached so impact would be very minimal. We constantly use application insights dependency data to view the results of our queries and determine impact.
Really wish the feedback from this issue was considered more to help people in my situation have a migration path: https://github.com/dotnet/efcore/issues/14935
Would have been able to upgrade to 3.1, toggle to allow client evaluation, then over time we could toggle this and address some of the breaking unit tests, toggle back to release, rinse repeat until we have all instances addressed. We have a a fairly large project with multiple exceptions that the team could have worked on over time.
Is the only option forward to wait until we have bandwidth to fix several queries through the entire application? Really hope that isn’t the case 😦
@smitpatel
Regarding
GroupJoin
:I’m confused - sometimes you say key matching is the reason for not supporting, another time the client evaluation.
How exactly you expand the navigations is irrelevant, I used them just as example. Lets forget it and concentrate on
GroupJoin
and “client evaluation”.So what exactly is “client evaluation” in this concrete example:
and the two queries:
And here are my points:
(1) The final projection (hence the shape of the result set) is one and the same. (2) Both have no direct SQL equivalent producing the desired result set shape (3) Hence there must be some sort of “client evaluation” involved, no? (4) The first one translates to
executes successfully and produces the desired materialized shape.
The second fails with “client evaluation” exception. Why?
I would expected both to produce one and the same translation and one and the same materialized result set shape, or both fail because of “client evaluation”. The
null
comparison logic is irrelevant, bothGroupJoin
andJoin
LINQ methods are shortcuts for correlated subqueries limited to equality checks. So, again, why is the second LINQ pattern not supported, and more importantly, why you insist of never supporting it?It’s important, because even it’s just a shortcut, it’s much easier to use in LINQ query syntax, and also it would allow us (customers) to write custom extension methods for providing
GroupBy
support of a things you don’t - like the aforementioned top N items per group, if we could generically replace the unsupportedwith
We all agree that removing the client evaluation is a good thing in general.
The problem though is that there is no easy way to switch the query context.
AsEnumerable()
changes the type toIEnumerable<T>
, hence the final materializationAsync
calls are no more available.ToList()
/await ToListAsync()
immediately executes the query at that point. AndAsAsyncEnumerable()
requires installing additional package for further LINQ support.But let me get back to my previous comment. Let forget about
GroupBy
for now and discuss theGroupJoin
. While it has no full server side support, the translation and cooperative client/server evaluation is supported for queries using collection navigation properties. e.g. query likeis supported, but the equivalent
is not.
This makes no sense to me. While in StackOverflow we always tell people to use navigation properties, many times they just don’t want to add navigation properties to their models (for some unknown reason), or just can’t if there is no FK relationship between the joining tables. The fact that it is supported for navigation properties means that it should be supported for
GroupJoin
s.If this scenario is not targeted by this issue and the current inability to translate such queries is just a current limitation that will be addressed in the future, that’s fine. Just wanted to make sure it is not considered “by design” and “won’t fix”.
@NetMage - Feel free to use LINQ to SQL for scenarios where you find it useful.
https://github.com/aspnet/EntityFrameworkCore/issues/12560#issuecomment-525084333 is a good example of why it’s dangerous to implement implicit client evaluation of GroupBy. Requiring users to do explicit client evaluation with AsEnumerable forces them to think and understand the distinctions and perf impact, etc.
Reiterating previous thoughts, the more I think of it, the more I think we should aim to avoid implicit client evaluation for any scenario where client evaluation is (significantly) more expensive than server evaluation, when possible.
In retrospect it actually seems like quite a mistake for LINQ to have GroupBy returning groupings rather than a reduced/aggregated scalar, given that it tends to follow SQL in operator naming and the general prevalence of SQL.
@jsgoupil your question is a duplicate of #16730. You need to express the aggregate operation in a Select:
See the docs on GroupBy for more information and #16730. This is a good example of why implicit client evaluation for GroupBy would be a big pit of failure.
What’s the difference between collection navigation property and
GroupJoin
? Why using the former is translated and the latter is not (“by design”?!)@smitpatel Assuming that I (as not native English speaker) am not explaining it correctly.
All I wanted to say regarding
GroupJoin
is that in the remaining part of the LINQ query,bars
should be treated equally regardless of whether it is a result ofor
Same for
bars
here:and
Yes, that reason is client eval as @roji has already mentioned. The way GroupJoin is different from Join/LeftJoin is cardinality. That is the reason there is no SQL equivalent of GroupJoin. It is true that Join/LeftJoin can re-written as correlated subquery, but does that help in SQL translation? If EF Core did not have navigations then we wouldn’t have this issue of how to expand item. The reason we chose to correlated subquery for collection navigation rather than a GroupJoin is it’s equivalency to SQL in many scenarios. e.g.
If above query gets re-written into correlated subquery then it becomes,
Which translates to SQL as
Now if we converted the collection navigation to GroupJoin then arriving at above translation is fairly complex. I hope that explains why correlated subquery is preferred over GroupJoin for collection navigation. Apart from above, no matter how you would convert query in LINQ - Join/correlation, at the end it would be Join in SQL (except for some cases of outer/cross apply or scalar subquery) because SQL cannot support it. To elaborate more on point @roji noted above, why we don’t translate GroupJoin is difficulty of understanding when it would fetch more data than needed and explaining that to Customer. Further, since it needs to do client eval to compute groups it complicates the matter. The 3rd query you posted is fairly straight forward GroupJoin which would evaluate on client just fine. The moment you start composing over the grouping, there is potential danger that client eval is going to fetch more than from server. Further, suppose above GroupJoin worked but then you put it in the middle of the query and now your query does not work anymore. (let me add more about this in GroupBy below). As OP of this issue, I filed this issue for the sole purpose of doing naked GroupJoin/GroupBy support which doesn’t bring any additional data (exactly your 3rd query). It is nice to have for EF to evaluate since putting AsEnumerable does not simply work for GroupJoin like other operators. But we as a team arrived at the conclusion that, while we can make it work, it would just increase ambiguity for Customers.
GroupBy Sadly, GroupBy does have SQL representation in limited scenario which means we had to implement it at the same time, we have to explain to customers what works and what does not. While many smart customers understand what GroupBy correspond to SQL GROUP BY, (EF Core added support for SQL GROUP BY only), many fails to see it and files issue why the unsupported query does not work. Even worse, the previous client eval behavior was making it even bigger perf disaster. Few of the GroupBy issue I am aware of that we are not supporting but we plan to add support are
We have laid ground work for GroupBy following select top(n) of group by introducing Row number expression but to support each translation takes time in designing/implementing/testing. We have some of above translation scheduled for next release, some may have to wait. Once all above translations are done, probably, GroupBy will face lesser issue of client eval since most other ones are clear case when there is no SQL equivalent. Further GroupBy can use AsEnumerable operator easily.
Regarding client eval in async form - IX-Async would be way to go. As a framework, EF Core does not provide enumerable implementation, same way we would like to do the same for async code path too. It would be decision of corefx team if they want to put async enumerable methods beside sync ones. We are actively pushing for it for easy of using library for our customers, till then using external lib is way to go.
For comparison of EF6 vs EF Core, let us know what specific queries EF6 translated which EF Core is not translating. While we have taken liberty to explicitly not support few things from EF6 which were problematic, we still try to achieve parity for most part so that customers have ease of migration. I wouldn’t say that query pipeline in EF Core is on par with EF6. It has some new features which EF6 did not have, while it lacks behind in certain parts. We have tracking issues for things we are aware lacking and we want to do in EF Core already. Help with finding any missing translation which is not being tracked would useful.
@NetMage - You said it translates in LINQ to SQL but not in EF Core. If LINQ to SQL works for you, feel free to use it.
@powermetal63 here’s one good justification (at least IMHO) for why the 1st is supported and the 2nd isn’t… The main point of removing implicit client evaluation between 2.2 and 3.0 is to make it extremely clear/explicit which parts of the query are executed on the server, and which on the client. Our current logic is extremely simple: everything except the top-level select must always be server-evaluatable, or we throw. If we start to add other exceptions - such as GroupJoin - we start muddying things up again, and users have to start remembering what’s implicitly client-evaluatable, and what’s not. In other words, a user could write a GroupJoin - expecting it to be server-translated and efficient - not knowing that in fact all data is being brought back from the server, etc. (this is important especially if we continue to compose operators over the GroupJoin results). With the current situation there’s just one, crystal-clear case where we implicitly client-evaluate, and that’s an advantage IMHO.
And as always, it’s extremely easy to insert an AsEnumerable before the GroupJoin to explicitly trigger client evaluation, so implicit evaluation of GroupJoin seems to me to have very limited value. Having said that, this is only my opinion and the team does intend to rediscuss our strategy with regards to client evaluation and possibly relax our current restrictions.
@NetMage AsEnumerableAsync is provided by EF Core, returning an IAsyncEnumerable. You can use foreach to iterate over that, or compose over it with async LINQ operators from the System.Linq.Async package.
@xqiu unless I’m mistaken, this was client-evaluating in 2.2, not translating to SQL. You can achieve that exact same behavior with 3.x by putting AsEnumerable before your GroupBy - that will trigger client evaluation (see this breaking change note). We no longer do this implicitly in 3.x in order to protect people from unknowingly pulling in large amounts of data from the database because your specific GroupBy can’t be translated to SQL.