framework: WhereHas query is too slow

  • Laravel Version: 5.4
  • PHP Version: 7.0
  • Database Driver & Version: Mysql 5.6.33

Description: As i tested, i founded out that whereHas query is much slower than the same query using join. Imagine this domain model: User - Like - Post - Tag User likes posts Post belongs to many tags Tag belongs to many posts Each post has several tags and each user may like several posts. We call a specified user interested in tag T if he/she had liked at least 10 posts which has tag T. Now, for a specified user U, we are going to find those tags which he/she is interested in.

With whereHas we have the following code:

App\Tag::whereHas('posts', function($q) {
    $q->whereHas('likes', function($q) {  // likes is a relation between posts and users (post_user)
        $q->where('users.id', 1);
    });
}, '>=', 10)->toSql()

Note that the relation between like and post is polymorphic. It produces the following SQL statement:

select * from `tags` where 
    (select count(*) from `posts` 
        inner join 
            `post_tag` on `posts`.`id` = `post_tag`.`post_id` 
        where `tags`.`id` = `post_tag`.`tag_id` and 
            exists 
                (select * from `users`
                    inner join `likes` on `users`.`id` = `likes`.`user_id` 
                    where `posts`.`id` = `likes`.`likeable_id` and `likes`.`likeable_type` = 'App\\Post' and `users`.`id` = 1
                )
    ) >= 10

But the following equivalent SQL statement performs 10 times faster with joins:

select `tags`.*, COUNT(tags.id) AS interest from `tags` 
    inner join `post_tag` on `tags`.`id` = `post_tag`.`tag_id` 
    inner join `posts` on `post_tag`.`post_id` = `posts`.`id` 
    inner join `likes` on `likes`.`likeable_id` = `posts`.`id` and `likes`.`likeable_type` = 'App\\Post' 
    inner join `users` on `likes`.`user_id` = `users`.`id` where `users`.`id` = 1 
    group by `users`.`id`, `tags`.`id` having `interest` >= 10 order by `interest` desc

I think RDMSs do lots of work to perform joins faster and ORM should take advantage of this.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 30 (11 by maintainers)

Most upvoted comments

@vogtdominikgit

Laravel is creating a temporary column on literally every row in the database, in which it fills with true or false. If any indexes can be used, they are probably used after this. Any query that uses whereHas is probably going to be a full table scan.

Here is a similar query, except it does not induce full table scans.

Profile::whereIn('profiles.id', function($query) use ($mac_address) {
    $query->from('devices')
        ->select('devices.profile_id')
        ->where('devices.mac_address', $mac_address);
})

Further reading:

https://github.com/laravel/framework/issues/18415 https://github.com/laravel/framework/issues/3543 https://stackoverflow.com/questions/46785552/poor-wherehas-performance-in-laravel

Eloquent has some edge cases, when you hit these just switch to using the query builder.

Closing since there’s no activity on the issue and it doesn’t really indicate a bug.

I am also wondering why Laravel uses exists clause for whereHas() while the same could have been done with in clause with any overhead.

I had to avoid using whereHas and use raw whereIn to increase the performance from 20 s to 100 ms

Would like to know the reason why to stick with exists clause.

You would have to prefix every statement with the table name.

Actually, I don’t see it as an impossible task. Eloquent could automatize it.

Currently a developer has to replace all with, whereHas and other calls with database queries - but then why do these even exist? When is it irrelevant to run 5 queries instead of 1? Maybe on a website for your family photos, or for your child’s kindergarten. But in production it can have big network / processing impact, and (at least according to my memory) nor the documentation notes this threat, you only realize it when it’s slow as hell, or if you are logging queries. And then you have to go back, replace all your calls, and change all the relevant code (since the query won’t return models anymore).

Maybe I just do something the wrong way, but it’s exactly what I had to do a time ago, because I naively thought Eloquent will generate an efficient query.

For those who are interested in a specific case and to see how much improvement a simple change can make. I do not want to start a discussion about how this is bad programming by the developer. I want to bring focus on the bigger issue.

On one side a framework exists to help developers ship more features more frequently, but on the other side a framework can be a pain in terms of performance. This is due to every abstraction layer adding overhead and reducing control over the underlying code-design.

Don’t get me wrong, I love Laravel. But I think it is important to address this issue in the laravel docs and add a disclaimer, that these kind of queries might cause a lot of performance-loss in the long term.

Here is my result by changing 5 lines of code:

DB (with indexes) profiles table with 285.714 entries devices table with 350.428 entries

with 2,610 calls per minute from various places

Code code change

QueryTime querytime drop

CPU cpu drop

But due to my lack of understanding, can somebody please explain to me why this is such a big improvement?

@AbiriAmir joins add up to what query is selecting, while exists doesn’t. In other words, when you join, and in your PHP have something like $post->where('id', '>', 42) it will cause an error because id is ambiguous, and that is not desired behaviour. You would have to prefix every statement with the table name. For that reason, Eloquent ORM is not using joins, and never will.

If you need improved performance for a specific use case, just create a custom scope or a repository class, where you can use the query builder to add joins. Btw you can use dot syntax to access nested relationships, like this: whereHas('posts.likes', '>=', 10).

Also, in most cases, exists is actually faster, because it doesn’t order RDBMS to fetch any data, just check whether relevant rows exist, which often can be done by a simple index lookup.

@Bedivierre Thanks for your reply!

Maybe you’re misunderstand why where exists slowly, when the both table is big, the main reason is external table(users), not internal table(posts), this is determined by the mysql mechanism, you can see this comment3543#41595672, that has been explained and the solution is given.

And i saw your comment at stackoverflow, join can really improve performance, but it is difficult to apply in complex scenes. For example, users::has('posts.comments')->get(), and even 10 kinds of relations in the framework, it’s hard to rewrite with join.

Finally, welcome to use hasin, give start!😄

I met a similar issue today…

Collection has a many to many relationship with Image.

This is very slow:

$collections = Collection::orderBy('created_at', 'desc')
    ->whereHas('images')
    ->paginate(36);

So I fix the issue with this:

$collections = Collection::orderBy('created_at', 'desc')
    ->whereIn('collections.id', function($query) {
        $query->select('image_collection.collection_id')
            ->from('image_collection')
            ->join('images', 'images.id', 'image_collection.image_id')
            ->whereNull('images.deleted_at');
    })
    ->paginate(36);

Hope this helps.

whereHas producing an AND EXISTS statement is causing tons of slowness (12+seconds) to my queries… if I run a raw query using left joins it works in under a second.

@howtomakeaturn

With https://github.com/laravel/framework/issues/18415#issuecomment-650828294, also you may be able to solve like this:

$collections = Collection::orderBy('created_at', 'desc')
    ->hasByNondependentSubquery('images')
    ->paginate(36);

no you don’t need to use “fully raw queries”. You can achieve things in QB / Eloquent by using selectRaw, whereRaw etc

I think that’s what @jhdxr meant. At least I’m combining Eloquent with QueryBuilder, so it’s somewhere between efficiency and code readability.

@joshuaGlass808 as explained above - it depends. For some queries EXISTS is faster for others JOIN is. Just use join if you find it better in your use case.

Also if you want to query only rows that have some relationship you probably want a full join rather than left join.

@KristofMorva it’s very hard to generate an efficient query, especially you have to support diffierent rdbms. for example, there are lots of sql queries which are considered to slow down the performance for mysql, like sub-query, exists. but for oracle, they are just piece of cake. IMHO, I’ll keep as simple as possilbe when I use ORM. and if I need do something complicated with sql, just use the raw query (with the help of functions like selectRaw, whereRaw, orderByRaw or something similar).