objection.js: Eager loading doesn't use joins?

I’m finding eager loading works via a second query with an IN clause, is there some reason joins are not used?

This prevents me from e.g aggregating an eagerly loaded value like so

Product.
query().
eager('reviews')
avg('reviews.rating').
as('avgRating').
groupBy('column1', 'column2', ...)

Thanks

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 1
  • Comments: 20 (1 by maintainers)

Most upvoted comments

Finally sober 😉

To make the eager stuff use joins while keeping the current features we would basically have to rewrite objection. As an example, let’s consider this simple eager query:

A.query().eager('[b.c, d]')

We would need to generate a query like this:

select
  A.id as A_id,
  A.p1 as A_p1,
  A.p2 as A_p2,
  A_b.id as A_b_id,
  A_b.p3 as A_b_p3,
  A_b.p4 as A_b_p4,
  A_b_c.id as A_b_c_id,
  A_b_c.p5 as A_b_c_p5,
  A_b_c.p6 as A_b_c_p6,
  A_d.id as A_d_id,
  A_d.p7 as A_d_p7,
  A_d.p8 as A_d_p8,
from A
inner join B as A_b on A_b.aid = A.id
inner join C as A_b_c on A_b_c.bid = A_b.id
inner join D as A_d on A_d.aid = A.id

And assuming that each relation yields just 2 rows the result object would be something like this:

[
  {  A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 2, A_p2: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 1, A_p1: 1, A_p2: 1, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 },
  {  A_id: 2, A_p1: 2, A_p2: 2, A_b_id: 1, A_b_id: 1, A_b_p3: 2, A_b_p4: 1, A_b_c_id: 1, A_b_c_p5: 1, A_b_c_p6: 1, A_d_id: 1, A_d_p7: 1, A_d_p8: 1 }
]

Now don’t care about the errors in the SQL or the actual result column values, I just wrote this down without testing it.

Two main points to notice here:

  1. We need to select all the columns with generated aliases so that they don’t collide.
  2. The result set contains all the selected values for all the rows which results in a combinatory explosion.

This example only had a simple eager expression with tables that had only two columns and all relations had just 2 results. And even with this tiny example the result object is big. Now try to create this example in your minds with eager expression like [a.b.c.d.e.[f, g.h], i.j, l.m.n, o.[p, q, r]] with realistic tables that have 6 or more columns and with result sets with hundreds of rows. This is not a far fetched example, I write code like this all the time.

The problems:

  1. With any realistic example the result object would contain 90% useless properties and it would become slower than it is with separate queries simply because of the memory and cpu overhead V8 adds.
  2. We would need to know all columns of the models/tables to build the select clauses. For this we would need to make jsonSchema required or add some other mechanism for finding out the columns.
  3. Making the eager filtering stuff work with this. Currently you can say stuff like .filterEager('a.b.d', builder => builder.where('p1', 10)) We would need to map the property p1 into A_a_c_d.p1 behind the scenes and because of the lack of reflection features in knex, it is not easily doable.

Conclusions

So replacing the current query system with joins is just not going to happen. We can however consider adding another mechanism for building nested joins. Or possibly with a lot of work, we could provide an option to choose which method to use for the eager queries, but really this would mean rewriting everything.

I don’t think that the performance benefits are that big. Please send any benchmarks you have that show that using separate queries is significantly slower than using joins for the common cases.

This seems to interest people even though it is closed so I’m reopening it. I’m drunk right now so I’ll answer this later 😀

I’m going to add join based eager queries as an option to the current where in queries. The loading algorithm can be selected per-query, per-model and per-project. I’ll keep you posted.

I’ll soon release the 0.5.0-alpha.0 version that has the new joinRelation method. You can install it using npm install objection@next