rails: ActiveRecord `pluck(:name)` returning different number of results than without `pluck`

Steps to reproduce

Have a Project model with a has_many improvements. Improvements and projects have a name column.

Run these 2 queries:

Project.includes(:improvements).where(improvements: { name: 'one' })
Project.includes(:improvements).where(improvements: { name: 'one' }).pluck(:name)

Expected behavior

I expect each query to return the same number of results (i.e. running #count on them will return the same number)

Actual behavior

The query with the pluck(:name) in it returns a different result set with many duplicates.

Without a pluck it runs this query:

SELECT DISTINCT "projects"."id" FROM "projects" LEFT OUTER JOIN "improvements" ON "improvements"."project_id" = "projects"."id" WHERE "improvements"."name" = $1 LIMIT $2

Followed by another long query to get all the actual data. However, thanks to the DISTINCT restriction, no duplicate data is returned.

With the pluck(:name) method, I get this query:

 SELECT "projects"."name" FROM "projects" LEFT OUTER JOIN "improvements" ON "improvements"."project_id" = "projects"."id" WHERE "improvements"."name" = $1 

Without a second query since no other data is required. However, because this is no longer using DISTINCT on the project ID, I am getting multiple results, even though I only have one project.

In my case I have a single project called “hello” with 5 improvements. 4 of which have the name “one” I expect to get an array equal to ["hello"], but instead I get ["hello", "hello", "hello", "hello"].

I understand WHY this is happening and it makes sense, but the expected behaviour from pluck(:name) is that it simply gets a single column from the query results, and does not alter the query itself. In this case it is completely changing what the query does.

Note: A temporary solution I am using is to convert the result set to an array and then plucking the results

System configuration

Rails version: Rails 6.0.3.2 Ruby version: ruby 2.7.1p83 (2020-03-31 revision a0c7c23c9c) [x86_64-linux]

Reproducible script

# frozen_string_literal: true

require "bundler/inline"

gemfile(true) do
  source "https://rubygems.org"

  git_source(:github) { |repo| "https://github.com/#{repo}.git" }

  # Activate the gem you are reporting the issue against.
  gem "activerecord", "6.0.3"
  gem "sqlite3"
end

require "active_record"
require "minitest/autorun"
require "logger"

# This connection will do for database-independent bug reports.
ActiveRecord::Base.establish_connection(adapter: "sqlite3", database: ":memory:")
ActiveRecord::Base.logger = Logger.new(STDOUT)

ActiveRecord::Schema.define do
  create_table :posts, force: true do |t|
    t.string :name
  end

  create_table :comments, force: true do |t|
    t.integer :post_id
    t.string :name
  end
end

class Post < ActiveRecord::Base
  has_many :comments
end

class Comment < ActiveRecord::Base
  belongs_to :post
end

class BugTest < Minitest::Test
  def test_pluck_does_not_effect_results
    post = Post.create!(name: 'hello')
    post.comments << Comment.create!(name: 'one')
    post.comments << Comment.create!(name: 'one')
    post.comments << Comment.create!(name: 'one')
    post.comments << Comment.create!(name: 'one')

    without_pluck = Post.includes(:comments).where(comments: { name: 'one' }).to_a.count
    with_pluck = Post.includes(:comments).where(comments: { name: 'one' }).pluck(:name).to_a.count

    assert_equal without_pluck, with_pluck
  end
end

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (12 by maintainers)

Most upvoted comments

@tgxworld You’re right - So I’ll reiterate what I’m seeing here: Given a relation with an includes, for example Post.includes(:comments).where(comments: { name: 'one' }) , should running pluck on that relation ever change the amount of records returned? I’m not 100% sure of the answer. Maybe this is the correct behavior but we can be a bit clearer in the documentation about what’s returning during includes + pluck. It may be happening here: https://github.com/rails/rails/blob/master/activerecord/lib/active_record/relation/calculations.rb#L189

The purpose of pluck is so that you don’t have to load all the AR models when all you want is a single attribute. But I don’t see why the internals of how pluck works matter here.

I think the key here is that pluck returns the column of the query result. If the query returns 4 rows, pluck will return the specified columns of all 4 rows.

Without an understanding of how it works internally and just by reading the docs would you still say that you would expect it to return 4 strings (the same post name repeated for each comment a post has) even though I am calling pluck on the Post and not the Comment?

It returns 4 strings because of .includes(:comments).where(comments: { name: 'one' }) which does a LEFT JOIN on the comments table which has 4 records that matches the conditions. pluck simply takes the specified columns of all the rows and returns the array of type casted columns to the user.

Also note that ActiveRecord behaves differently when you use joins vs includes.

Post.left_joins(:comments).where(comments: { name: 'one' }).to_a
# SELECT "posts".* FROM "posts" LEFT OUTER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE "comments"."name" = 'one'
[#<Post id: 1, name: "hello">, #<Post id: 1, name: "hello">, #<Post id: 1, name: "hello">, #<Post id: 1, name: "hello">]

vs

Post.includes(:comments).where(comments: { name: 'one' }).to_a
# SELECT "posts"."id" AS t0_r0, "posts"."name" AS t0_r1, "comments"."id" AS t1_r0, "comments"."post_id" AS t1_r1, "comments"."name" AS t1_r2 FROM "posts" LEFT OUTER JOIN "comments" ON "comments"."post_id" = "posts"."id" WHERE "comments"."name" = 'one'
[#<Post id: 1, name: "hello">]

Thanks @esparta, I just added the script

Will be helpful if you can add a reproducible script for this issue: https://guides.rubyonrails.org/contributing_to_ruby_on_rails.html#create-an-executable-test-case