framework: WhereHas query is too slow
- Laravel Version: 5.4
- PHP Version: 7.0
- Database Driver & Version: Mysql 5.6.33
Description: As i tested, i founded out that whereHas query is much slower than the same query using join. Imagine this domain model: User - Like - Post - Tag User likes posts Post belongs to many tags Tag belongs to many posts Each post has several tags and each user may like several posts. We call a specified user interested in tag T if he/she had liked at least 10 posts which has tag T. Now, for a specified user U, we are going to find those tags which he/she is interested in.
With whereHas we have the following code:
App\Tag::whereHas('posts', function($q) {
$q->whereHas('likes', function($q) { // likes is a relation between posts and users (post_user)
$q->where('users.id', 1);
});
}, '>=', 10)->toSql()
Note that the relation between like and post is polymorphic. It produces the following SQL statement:
select * from `tags` where
(select count(*) from `posts`
inner join
`post_tag` on `posts`.`id` = `post_tag`.`post_id`
where `tags`.`id` = `post_tag`.`tag_id` and
exists
(select * from `users`
inner join `likes` on `users`.`id` = `likes`.`user_id`
where `posts`.`id` = `likes`.`likeable_id` and `likes`.`likeable_type` = 'App\\Post' and `users`.`id` = 1
)
) >= 10
But the following equivalent SQL statement performs 10 times faster with joins:
select `tags`.*, COUNT(tags.id) AS interest from `tags`
inner join `post_tag` on `tags`.`id` = `post_tag`.`tag_id`
inner join `posts` on `post_tag`.`post_id` = `posts`.`id`
inner join `likes` on `likes`.`likeable_id` = `posts`.`id` and `likes`.`likeable_type` = 'App\\Post'
inner join `users` on `likes`.`user_id` = `users`.`id` where `users`.`id` = 1
group by `users`.`id`, `tags`.`id` having `interest` >= 10 order by `interest` desc
I think RDMSs do lots of work to perform joins faster and ORM should take advantage of this.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 30 (11 by maintainers)
@vogtdominikgit
Laravel is creating a temporary column on literally every row in the database, in which it fills with true or false. If any indexes can be used, they are probably used after this. Any query that uses whereHas is probably going to be a full table scan.
Here is a similar query, except it does not induce full table scans.
Further reading:
https://github.com/laravel/framework/issues/18415 https://github.com/laravel/framework/issues/3543 https://stackoverflow.com/questions/46785552/poor-wherehas-performance-in-laravel
Eloquent has some edge cases, when you hit these just switch to using the query builder.
Closing since there’s no activity on the issue and it doesn’t really indicate a bug.
I am also wondering why Laravel uses
existsclause forwhereHas()while the same could have been done withinclause with any overhead.I had to avoid using
whereHasand use rawwhereInto increase the performance from 20 s to 100 msWould like to know the reason why to stick with
existsclause.Actually, I don’t see it as an impossible task. Eloquent could automatize it.
Currently a developer has to replace all
with,whereHasand other calls with database queries - but then why do these even exist? When is it irrelevant to run 5 queries instead of 1? Maybe on a website for your family photos, or for your child’s kindergarten. But in production it can have big network / processing impact, and (at least according to my memory) nor the documentation notes this threat, you only realize it when it’s slow as hell, or if you are logging queries. And then you have to go back, replace all your calls, and change all the relevant code (since the query won’t return models anymore).Maybe I just do something the wrong way, but it’s exactly what I had to do a time ago, because I naively thought Eloquent will generate an efficient query.
For those who are interested in a specific case and to see how much improvement a simple change can make. I do not want to start a discussion about how this is bad programming by the developer. I want to bring focus on the bigger issue.
On one side a framework exists to help developers ship more features more frequently, but on the other side a framework can be a pain in terms of performance. This is due to every abstraction layer adding overhead and reducing control over the underlying code-design.
Don’t get me wrong, I love Laravel. But I think it is important to address this issue in the laravel docs and add a disclaimer, that these kind of queries might cause a lot of performance-loss in the long term.
Here is my result by changing 5 lines of code:
DB (with indexes) profiles table with 285.714 entries devices table with 350.428 entries
with 2,610 calls per minute from various places
Code
QueryTime
CPU
But due to my lack of understanding, can somebody please explain to me why this is such a big improvement?
@AbiriAmir joins add up to what query is selecting, while
existsdoesn’t. In other words, when you join, and in your PHP have something like$post->where('id', '>', 42)it will cause an error becauseidis ambiguous, and that is not desired behaviour. You would have to prefix every statement with the table name. For that reason, Eloquent ORM is not using joins, and never will.If you need improved performance for a specific use case, just create a custom scope or a repository class, where you can use the query builder to add joins. Btw you can use dot syntax to access nested relationships, like this:
whereHas('posts.likes', '>=', 10).Also, in most cases,
existsis actually faster, because it doesn’t order RDBMS to fetch any data, just check whether relevant rows exist, which often can be done by a simple index lookup.@Bedivierre Thanks for your reply!
Maybe you’re misunderstand why where exists slowly, when the both table is big, the main reason is external table(users), not internal table(posts), this is determined by the mysql mechanism, you can see this comment3543#41595672, that has been explained and the solution is given.
And i saw your comment at stackoverflow,
joincan really improve performance, but it is difficult to apply in complex scenes. For example,users::has('posts.comments')->get(), and even 10 kinds of relations in the framework, it’s hard to rewrite withjoin.Finally, welcome to use hasin, give start!😄
I met a similar issue today…
Collectionhas a many to many relationship withImage.This is very slow:
So I fix the issue with this:
Hope this helps.
mpyw/eloquent-has-by-non-dependent-subquery: Convert has() and whereHas() constraints to non-dependent subqueries.
whereHas producing an AND EXISTS statement is causing tons of slowness (12+seconds) to my queries… if I run a raw query using left joins it works in under a second.
@howtomakeaturn
With https://github.com/laravel/framework/issues/18415#issuecomment-650828294, also you may be able to solve like this:
I think that’s what @jhdxr meant. At least I’m combining Eloquent with QueryBuilder, so it’s somewhere between efficiency and code readability.
@joshuaGlass808 as explained above - it depends. For some queries
EXISTSis faster for othersJOINis. Just use join if you find it better in your use case.Also if you want to query only rows that have some relationship you probably want a full join rather than
left join.@KristofMorva it’s very hard to generate an efficient query, especially you have to support diffierent rdbms. for example, there are lots of sql queries which are considered to slow down the performance for mysql, like sub-query, exists. but for oracle, they are just piece of cake. IMHO, I’ll keep as simple as possilbe when I use ORM. and if I need do something complicated with sql, just use the raw query (with the help of functions like
selectRaw,whereRaw,orderByRawor something similar).