graphql-js: Slow response times with large documents
Originally posted here:
https://apollographql.slack.com/archives/general/p1487368874010273
I’m trying to resolve some performance issues with large documents, and the problem (AFAICT) is due to the pruning of the document based on requested fields.
Here’s how I discovered it:
Query: {
async something(...) {
...do something that takes 100ms...
return oneMbOfJSON;
}
2s - 4s response time.
Even if I use formatResponse(response) { return [] } to pretend nothing came back , it’s still a problem somewhere before formatResponse.
Query: {
async something(...) {
...do something that takes 100ms...
return []; // Did the work, but not bother sending it back
}
106ms response time.
In the resolver, If I do:
context.payload = oneMbOfJSON;
return [];
And then use formatResponse to do:
{ data: options.context.payload }
I can see the response starting & streaming much faster.
A co-worker tried master to see if #710 resolves it, but it does not appear so.
For reference on why we have a document this large, it’s because, internally, we leverage GraphQL to fetch a full document that we then reduce into a separate document that describes the state of key entities for internal tooling.
(For example, “why does this product not appear for users in Texas?”)
Because of the complex (programmatic) rules that run against these documents, we’re showing internal users the unfiltered document and filtering using the same logic that happens in user-land.
In the short-term, it appears our best option is to find a means of returning an unfiltered document (for performance) for internal uses?
About this issue
- Original URL
- State: open
- Created 7 years ago
- Reactions: 22
- Comments: 16 (5 by maintainers)


I think the TL;DR of this issue is that GraphQL has some overhead and that reducing that overhead is non-trivial and removing it completely may not be an option. Ultimately GraphQL.js is still responsible for making API boundary guarantees about the shape and type of the returned data and by design does not trust the underlying systems. In other words GraphQL.js does runtime type checking and sub-selection and this has some cost.
I think improving the performance of GraphQL.js execution is still very possible but achieving the same performance as the lower bound of simply passing through data without sub-selection or checking is probably not possible.
The changes Ivan referenced in his comment above will help out a bunch. They may collectively reduce a 1000ms query to a 700ms query if the majority of the time spent is in GraphQL overhead rather than waiting on services. There’s also probably a lot more room to improve, but that’s something I would look to heavy users of GraphQL.js to help analyze and contribute. PRs that speed up execution are always welcome for review.
Also, I think it’s always worth considering the tools you’re using for the job. I think if your responses are measuring in the 100’s of megabytes then GraphQL may be the wrong tool for that job. Likewise, if you see sub-selection as a pain point costing time rather than a feature allowing flexibility, then again GraphQL is probably the wrong tool for that job.
One option I’ve seen in the past is to create a custom “scalar” type which simply captures a bag of untyped JSON, so that you can both use GraphQL for the portions where that is valuable but fall back on plain JSON when that is more useful.
I’ve created a simple example that shows performance degradation of GraphQL caused by returning a promise from a resolver. This issue doesn’t seem to be related to type-checking, but rather actual GraphQL implementation. I understand the point about GraphQL not being the right tool for these type of queries, but still this seems like an actual issue.
Sample results:
Here is the code snippet: https://gist.github.com/wiktor256/fede8f058dbdc1728910de8400c50c72
Follow up: I replaced native node Promise with Bluebird, and performance results are much better.
I also experienced very slow performance with an array of large documents but using graphql server cache solved the issue entirely (I’m using Apollo Server with their response cache plugin). Before using the plugin it would take 1 second to get the response, now it takes 88 msec. This workaround may be the easiest in terms of implementation while we still retain all the useful features of GraphQL like type checking and fields selection.
Hi, This is causing problems when we have schemas stitched from remote graphql services. The response shape is already validated by the underlying graphql-server, and graphql-js re-validating that response is just creating a totally avoidable performance bottleneck. It would be great to have an option to skipValidation, (disabled by default). A developer explicitly enabling this option would understand the cons of doing it. Not having an option at all is frustrating.
I also wanted to find out which part of the graphql engine is responsible for this performance penality. For this, I first let R2D2 have 100000 friends and then used the
HeroNameAndFriendsQueryof thestarWarsQuery-test.js(addingidof the friend to query two fields). To narrow the source of the latency down, I first disabled the biggest chunk (completeListValue) and then gradually allowed more of the execution functions to be called. These are my results:All timings are just rough estimates, I ran the tests only for ~2s each and thus did not have a lot of iterations. But I think the takeaway is that there is no single “bad function” we could fix and magically have the performance improve by a large factor.
The test could be repeated with a more complex object and nested loops to more closely reflect real huge queries, but I don’t think this would change much.
My suggestion
If you have a huge subgraph in your GraphQL schema that
@skip/@include(or you somehow process them earlier), andI would suggest the following: Annotate these huge lists of objects in some kind (e.g. attaching a Symbol) and and then check for this symbol in
completeValue, right after the check forinstanceof Error:execute.js
This totally cuts off any processing of the huge subgraph and, in my test case, yields sub-millisecond response times.
As someone who is experiencing similiar issues I’d like to provide my profiler. I have a simple query like so
query ExampleQuery { fetchPeople { … } fetchProjects { … } fetchOrders { … } … ~10 more fetches }
Most of the data fetched is within 5-20 results. However one of the data points fetched is 1200 results. I ran a profiler and seems like a significant portion relates to graphql.execute.js. I used ab testing and had somewhere around 1.5-2.5 seconds for my requests even though I’m seeing the longest individual fetch taking around 500ms. How can I make this more performant?
Statistical profiling result from isolate-0x102004600-v8.log, (12704 ticks, 1351 unaccounted, 0 excluded).
isolate-0x102004600-v8.log
@leebyron Sorry I didn’t see this until now.
The delay in response time (2-4s, as seen in the 1st GIF) points entirely to the pruning/reformatting of the results to match the requested property structure.
When I bypass that part, I can establish a reliable baseline for the query/validation/etc., which is < 100ms.
What I was wondering is if, for large arrays, if there is duplicated validation or formatting that’s performed on each item that could be done once?
(Say that there was a performance opportunity where GraphQL checks each item in array that the requested property
dateCreatedexists on the parent typePost. This could be done once for the first node, and cached for subsequent nodes in the array for O(1) vs. O(n) or whatever).