graphql-ruby: NewRelic Tracing + Dataloader makes for slow queries
Describe the bug
Yesterday we started using Dataloader to batch queries, and to our surprise, it made our GraphQL endpoint significantly slower. Here’s the Scout chart:

The individual traces are not very helpful as they just indicate that a single code line (the query execution in the controller) takes most of the time.
This went against all my local benchmarks, so I knew it had something to do with a production config. We’re moving away from NewRelic to Scout, but we still have NewRelicTracing enabled, and we don’t use ScoutTracing.
After disabling NewRelicTracing, response times plummeted:

I don’t know if this is specific to NewRelicTracing or PlatformTracing in general.
Versions
graphql version: 1.13.1
rails : 6.1.4.6
newrelic_rpm: 8.4.0
GraphQL schema
Here’s the schema before removing the tracer:
class AdminSchema < GraphQL::Schema
INTROSPECTION_QUERY_DEPTH = 13
use(GraphQL::Tracing::NewRelicTracing)
query_analyzer QueryLogger
mutation(Types::Mutation)
query(Types::AdminQuery)
use GraphQL::Dataloader
default_max_page_size 100
max_depth 10
rescue_from ActiveRecord::RecordNotFound do |exception|
raise GraphQL::ExecutionError.new("Resource not found.", options: { type: "NOT_FOUND" })
end
rescue_from StandardError do |exception|
if Rails.env.production? && !exception.is_a?(GraphQL::ExecutionError)
Rollbar.error(exception)
GraphQL::ExecutionError.new("Unknown error occurred. Please try again later.", options: { type: "INTERNAL_SERVER_ERROR" })
else
raise exception
end
end
end
My dataloader sources:
class Sources::ActiveRecord < GraphQL::Dataloader::Source
def initialize(model_class)
@model_class = model_class
end
# The resulting *array* must be of the same length and order as `ids`,
# so we set missing objects to `nil`.
def fetch(ids)
@model_class.where(id: ids).index_by(&:id).values_at(*ids)
end
end
class Sources::ActiveRecordAssociation < GraphQL::Dataloader::Source
def initialize(model_class, association_name)
@model_class = model_class
@association_name = association_name
if !@model_class.reflect_on_association(@association_name)
raise ArgumentError, "No association #{@association_name} on #{@model_class}"
end
end
def load(record)
raise TypeError, "#{@model_class} loader can't load association for #{record.class}" unless record.is_a?(@model_class)
super
end
def fetch(records)
::ActiveRecord::Associations::Preloader.new.preload(records, @association_name)
records.map { |record| read_association(record) }
end
private
def read_association(record)
record.public_send(@association_name)
end
end
The objects involved in the query
class Types::AdminQuery < Types::BaseObject
graphql_name "Query"
description "The query root of this schema."
field :members, Types::Member.connection_type, "A list of all members.", null: true
def members
context[:current_site].members.order(:id)
end
end
class Types::Member < Types::BaseObject
graphql_name "Member"
field :orders, [Types::Order, null: true], "A list of all orders for this member.", null: true
field :subscriptions, [Types::SubscriptionType], "A list of all subscriptions for this member.", null: false
def orders
dataloader.with(Sources::ActiveRecordAssociation, Member, :orders).load(object)
end
def subscriptions
dataloader.with(Sources::ActiveRecordAssociation, Member, :subscriptions).load(object)
end
end
class Types::SubscriptionType < Types::BaseObject
graphql_name "Subscription"
field :plan, Types::Plan, "The plan this subscription is associated with.", null: true
def plan
dataloader.with(Sources::ActiveRecord, Plan).load(object.plan_id)
end
end
GraphQL query
{
members(last: 100) {
nodes {
fullName
email
subscriptions {
expiresAt
active
plan {
name
}
}
orders {
uuid
totalCents
purchasableType
}
}
}
}
Expected behavior
I expected the Dataloader to significantly reduce query execution time
Actual behavior
Instead, it made things worse.
Additional context
My uneducated guess is that the tracer is going crazy with all the batching going on, but I’m not too familiar with the gem internals to understand why.
Because we’re moving away from NewRelic, we don’t have any plans of bringing back the tracer, so this is more of a heads-up kind of issue.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 15 (5 by maintainers)
The New Relic Ruby agent team has made further improvements to the performance of segment completion. These improvements brought an internal user’s service running GraphQL Dataloader from a > 5 seconds delay to a < 15 ms delay. We have confidence Dataloader and New Relic can peacefully coexist in versions 9.2.2+ of
newrelic_rpm.Hi all. Thanks very much to @bmulholland for letting us (New Relic) know about this issue. Thanks very much also to @hkdnet for determining that New Relic’s
merge_or_appendmethod is the bottleneck!We have investigated and prepped a potential fix.
The current hypothesis is that the
graphqlgem is invoking the New Relic Ruby agent gem’strace_execution_scopedmethod frequently over a short time period and the New Relic agent’s internal system of keeping track of multiple potentially overlapping traces is not performant when there are that many (~100 or so) overlapping traces in play.With https://github.com/newrelic/newrelic-ruby-agent/pull/1274 we have completely removed the
merge_or_appendmethod and reworked its original caller to be more performant when dealing with a big batch of traces.We would love some help with testing out the new branch to see if it helps with performance. If it does, we can include the changes in a future agent release.
There is a new
striderbranch of the project that is ready for testing.A Ruby project’s Gemfile can be set to use the
striderbranch of the New Relic Ruby agent like so:The
graphqlgem is used (and loved!) inside New Relic as well, but we haven’t been using dataloader.If there are any questions or feedback for us, we’ll keep an eye on this GitHub Issue as well as the https://github.com/newrelic/newrelic-ruby-agent/issues/1264 twin over in our repo.
Thanks for your investigation and fix on this, @fallwith! If anyone runs into trouble like this after updating to 8.10.0, please open a new issue to discuss it.
The v8.10.0 release of the
newrelic_rpmgem with the improvements is now available on RubyGems.org.Great work @hkdnet, thanks for digging in and posting a repro! I logged https://github.com/newrelic/newrelic-ruby-agent/issues/1264 for the New Relic side
After some investigation, I found
NewRelic::Agent::RangeExtensions.merge_or_appendconsumes time.You can see a flamegraph with this branch. https://github.com/hkdnet/graphql-ruby-nr/tree/stackprof
^ is a zoomed capture. The pink rectangle is
merge_or_append.I confirmed 33670ms -> 700ms speed-up if
merge_or_appenddoesn’t consume time (though this patch breaks newrelic_rpm completely). This showsmerge_or_appendis the main cause. https://github.com/hkdnet/newrelic-ruby-agent/commit/3c097d601be6b935ca9dcd646ffe7a210f73a9bbDataLoader creates a lot of descendant ranges and
merge_or_appendis called every time a descendant finishes (NewRelic::Agent::Transaction::AbstractSegment#descendant_complete) I think the right way is to speed up newrelic_rpm gem. I’d like to know if you have any workaround ideas 😸