rails: ActiveRecord Sum (and other calculation operators) don't honor distinct attributes

When using the ActiveRecord “sum” calculation with the Postgres adapter (and I believe the others though I haven’t tested them), any attributes added in a “distinct” clause are not used to filter/trim the dataset. Though “DISTINCT” is added to the SQL, it is incorrectly added outside the SUM operation, which effectively ignores the relevant distinct attributes.

This seems to be directly related to issue https://github.com/rails/rails/issues/5554 and issue https://github.com/rails/rails/pull/6970. Though the pull request related to those issues seemed to fix the undesirable behavior for the “count” ActiveRecord::Calculations.

I’ve written a short example script to display this behavior (there’s also some Rails setup code included so it could be run outside of a Rails project, like in an IRB console easily):

require 'active_record'
require 'pg'


####    SETUP MIGRATIONS/DATABASE (Moved out of canonical Rails structure and into one file for demonstration)    ####
PG_SPEC = {
    :adapter  => 'postgresql',
    :host     => 'localhost',
    :database => 'my_test_db',
    :username => 'my_db_super_user',
    :encoding => 'utf8'
}

ActiveRecord::Base.establish_connection(PG_SPEC.merge('database' => 'postgres', 'schema_search_path' => 'public'))
ActiveRecord::Base.connection.drop_database PG_SPEC[:database] rescue nil
ActiveRecord::Base.connection.create_database(PG_SPEC[:database])
ActiveRecord::Base.establish_connection(PG_SPEC)

class CreateMyModels < ActiveRecord::Migration
  def self.change
    create_table :users do |t|
      t.integer :summable_attr
      t.timestamps
    end

    create_table :user_groups do |t|
      t.integer :user_id
      t.integer :group_id
      t.timestamps
    end

    create_table :groups do |t|
      t.timestamps
      t.string :status
    end
  end
end

CreateMyModels.change

####    SETUP ACTIVERECORD, POPULATE DATABASE, AND DEMONSTRATE ODD QUERY BEHAVIOR    ####

class User < ActiveRecord::Base
  has_many :user_groups
  has_many :groups, through: :user_groups

  # Should only return users that are active
  def self.all_active_users
    User.joins(:user_groups).joins(:groups).distinct(true).where('groups.status' => 'active')
  end
end

class Group < ActiveRecord::Base
  has_many :user_groups
  has_many :users, through: :user_groups
end

class UserGroup < ActiveRecord::Base
  belongs_to :group
  belongs_to :user
end

g1 = Group.create(status: 'active')
g2 = Group.create(status: 'inactive')

u1 = User.create(summable_attr: 1 )
u2 = User.create(summable_attr: 2 )
u3 = User.create(summable_attr: 3 )
u4 = User.create(summable_attr: 4 )

UserGroup.create(group_id: g1.id, user_id: u1.id)
UserGroup.create(group_id: g1.id, user_id: u2.id)
UserGroup.create(group_id: g1.id, user_id: u3.id)
UserGroup.create(group_id: g1.id, user_id: u4.id)


UserGroup.create(group_id: g2.id, user_id: u1.id)
UserGroup.create(group_id: g2.id, user_id: u2.id)
UserGroup.create(group_id: g2.id, user_id: u3.id)
UserGroup.create(group_id: g2.id, user_id: u4.id)

# Perform the sum using a Postgres AR query gives an incorrect result.
# Outputs "Total Sum: 20"
# Uses SQL: SELECT DISTINCT SUM("users"."summable_attr") AS sum_id FROM "users" INNER JOIN "user_groups" ON "user_groups"."user_id" = "users"."id" INNER JOIN "user_groups" "user_groups_users_join" ON "user_groups_users_join"."user_id" = "users"."id" INNER JOIN "groups" ON "groups"."id" = "user_groups_users_join"."group_id" WHERE "groups"."status" = 'active'
puts "Total Sum: #{User.all_active_users.sum('summable_attr')}"

# The correct summation SQL should be something like:
# Outputs "Total Sum: 10"
puts "Total Sum: #{ActiveRecord::Base.connection.execute(%q[SELECT DISTINCT SUM(DISTINCT "users"."id") FROM "users" INNER JOIN "user_groups" ON "user_groups"."user_id" = "users"."id" INNER JOIN "user_groups" "user_groups_users_join" ON "user_groups_users_join"."user_id" = "users"."id" INNER JOIN "groups" ON "groups"."id" = "user_groups_users_join"."group_id" WHERE "groups"."status" = 'active'])[0]["sum"]}"

# Perform the sum using a Ruby inject operator gives us the correct result (at quite a heavy performance/memory cost)
# Outputs "Total Sum: 10"
# Uses SQL: SELECT DISTINCT "users".* FROM "users" INNER JOIN "user_groups" ON "user_groups"."user_id" = "users"."id" INNER JOIN "user_groups" "user_groups_users_join" ON "user_groups_users_join"."user_id" = "users"."id" INNER JOIN "groups" ON "groups"."id" = "user_groups_users_join"."group_id" WHERE "groups"."status" = 'active'
puts "Total Sum: #{User.all_active_users.map {|u| u.summable_attr}.inject(:+)}"

# However, the "count" from ActiveRecord::Calculations seems to work correctly:
# Outputs "Total Count: 4"
# Uses SQL: SELECT DISTINCT COUNT(DISTINCT "users"."id") FROM "users" INNER JOIN "user_groups" ON "user_groups"."user_id" = "users"."id" INNER JOIN "user_groups" "user_groups_users_join" ON "user_groups_users_join"."user_id" = "users"."id" INNER JOIN "groups" ON "groups"."id" = "user_groups_users_join"."group_id" WHERE "groups"."status" = 'active'
puts "Total Count: #{User.all_active_users.count}"

About this issue

  • Original URL
  • State: closed
  • Created 10 years ago
  • Comments: 22 (7 by maintainers)

Most upvoted comments

Still exists in 5.2.3

This is supposedly fixed in Rails 5.2.0, but I am still hitting it there.