rails: ActiveRecord fails to correctly merge scopes containing IN constraints on the same column

Steps to reproduce

Merge two ActiveRecord scopes on the same model, each of which expresses a different IN constraint on the same column. For example:

User.where(id: [2, 3])
  .merge(User.where(id: [1, 2]))

This occurs whether the constraint is expressed as an immediate array (e.g. .where(col: [...]) or .where(Model.arel_table[:col].in([...]))), or (more importantly) as a subquery (e.g. .where(col: OtherModel.where(...).select(:col)))

Executable test case: https://gist.github.com/chrisandreae/be5ad28f14b1f2a0e0c6e0098230f4bc

Expected behavior

Both IN constraints should be expressed in the final query, for example:

SELECT "users".* FROM "users" WHERE "users"."id" IN (2, 3) AND "users"."id" IN (1, 2)

Actual behavior

Only the merged constraint is expressed in the final query, for example:

SELECT "users".* FROM "users" WHERE "users"."id" IN (1, 2)

System configuration

Rails version: 5.2.0

Ruby version: ruby 2.5.1p57

[Edited 20190122: Still fails in rails 6.0.0.beta1 (at master@3f0c7c5)]

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 6
  • Comments: 17 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Unlike with equality constraints, IN set membership constraints on the same column are not necessarily in conflict: there are valid semantics for combining the constraints, and moreover such constraints are presently correctly combined by Rails… provided that scope merge is not used.

In the case of equality constraints, Developer.where(salary: 50000) and Developer.where(salary: 60000) are inherently in conflict: it’s never possible for the salary of any record to be both 50000 and 60000. To merge these scopes, the merge operation makes a decision between the two of these. It’s valid and Ruby-like for it to choose the Hash-like overriding behaviour.

However, there is no such conflict with constraints on set membership or inequality relations. In both of those cases, there is a straightforward combination of the constraints, but at present Rails’ scope merge only combines the latter.

  • in the case of inequalities, constraints like Developer.where(Developer.arel_table[:salary].gt(30000)) and Developer.where(Developer.arel_table[:salary].lt(60000)) are compatible. They are currently correctly combined by merge simply by keeping both conditions as WHERE clauses and emitting WHERE salary > 30000 AND salary < 60000.

  • In the case of set membership, constraints likeDeveloper.where(salary: [20000, 40000]) and Developer.where(salary: [40000, 50000]), are also compatible. They could likewise be merged by keeping both conditions: WHERE salary IN (20000, 40000) AND salary IN (40000, 50000).

In fact, in contrast to the behaviour of merge, when they are presented as separate where clauses to a single scope, set membership constraints are merged as expected:

> Developer.where(salary: [30000, 40000]).where(salary: [40000, 50000])
  Developer Load (0.5ms)  SELECT "developers".* FROM "developers" WHERE "developers"."salary" IN ($1, $2) AND "developers"."salary" IN ($3, $4) LIMIT $5  [["salary", 30000], ["salary", 40000], ["salary", 40000], ["salary", 50000], ["LIMIT", 11]]
=> #<ActiveRecord::Relation [#<Developer id: 2, salary: 40000>]>

I believe that it’s surprising behaviour to find that merging scopes with complementary constraints on a column silently drops one of the constraints, but especially surprising that this only happens on a scope merge operation. It seems to me to be quite a trap, where a developer aiming to refactor a query into merged named parts could end up completely changing the meaning.

Thanks for confirming. This is the expected behavior of how merge is implemented. It behaves like Hash#merge where, when in conflict, the argument for merge takes precedence.

Agree. The following should be possible and handled by active record:

User.where(id: [2, 3]).merge(User.where(id: [1, 2]))

In my case I spent hours trying to understand how a merge command was adding records to the previous collection other than filtering it:

(byebug) ::Reporting::Entry.where(reporting_report_id: []).count
0
(byebug) ::Reporting::Entry.where(reporting_report_id: []).merge(report.entries).count
2

@jychen7 that is intended behavior only for equality conditions. Because two different equalities on the same column can not both match at the same time (i.e. the developer’s salary can’t be both 80000 and 9000), the intent appears to be for the later scope to override the former. (Although it would be even more correct to infer that the merged scope can never match any rows, and return an empty scope.)

However, this isn’t true for comparisons: the developer’s salary can be both > 3000 and < 6000, and ActiveRecord correctly merges scopes with these constraints. Likewise, the developer’s salary can be in both the sets [3000, 4000] and [4000, 5000] - but ActiveRecord fails to correctly merge in this case.

But the expected behavior of merge conflicts with what its docs say about arrays intersection. It would be great if there would be Relation#and really behaving like an intersection.

Now we can use Relation#and in Rails 6.1 (#39558).

The attached executable test case can still be reproduced on Rails master at 02b5b8c.