graphql-ruby: Returning lots of objects is slow
We’re seeing some performance issues with a certain field that returns a lot of data. To better understand how GraphQL performance depends on the size of the response, we wrote a benchmark script that uses the GraphQL gem to query for increasingly large lists of objects at once:
https://gist.github.com/lovitt/effa205b876d9a9b86ee58399cceaf31
All test data is pre-generated and in memory, so the script should only be measuring the overhead of the GraphQL system itself — the time it takes to resolve each of the fields, check the raw value against the expected type, etc., and build the response (i.e., not time to pull data out of some database). All returned objects are of the same type, and all 10 of its fields are requested in the query.
Here are the results on my OS X dev machine:
Calculating -------------------------------------
Querying for 10 objects - all fields
225.168 (±26.6%) i/s - 952.000 in 5.041098s
Querying for 100 objects - all fields
29.132 (±24.0%) i/s - 130.000 in 5.038850s
Querying for 1000 objects - all fields
2.713 (± 0.0%) i/s - 14.000 in 5.346691s
Querying for 10000 objects - all fields
0.262 (± 0.0%) i/s - 2.000 in 7.654611s
Comparison:
Querying for 10 objects - all fields : 225.2 i/s
Querying for 100 objects - all fields : 29.1 i/s - 7.73x slower
Querying for 1000 objects - all fields : 2.7 i/s - 82.99x slower
Querying for 10000 objects - all fields : 0.3 i/s - 858.20x slower
For this particular test schema and object type, there seems to be around ~1.8ms of overhead introduced for every object in the response. Do these numbers sound right? Are these performance characteristics about what we should expect…?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 22 (14 by maintainers)
I came back here to double-check my benchmark results and found I never posted them 🙈 ! So I ran them again.
Before
After
Using features on
master:So, it seems about 50% faster.
Ran these again on d874f79 and saw some improvements. The benchmark is a good bit (~30%) faster:
The overall memory usage is better too:
The method profile shows a lot less time in
.newand#initialize, but now much more in===:So I’ll see if I can avoid some of those!
I’ve migrated the benchmark above to the class-based schema so that I can try it with the new interpreter: https://gist.github.com/rmosolgo/eda7d1023cf0b096d52da6ba387b2785
Nothing to report on yet, but I wanted to share that I’m hacking away at it!
I haven’t looked into optimizations like that because it’s not a bottleneck for my application. (For us, it’s application logic and IO.) How about trying some Ruby profilers to make sure we’re hunting the right target? Here are two I’m familiar with:
The output of those profilers can be a bit … dense … but if you want to interpret it together, I’d be happy to take a look at the output. Just open another issue and share your benchmark code and the output of those.
Here’s some example code about how I do that when working on this gem:
https://github.com/rmosolgo/graphql-ruby/blob/1a21f665554a00c822b8f5212b2900e64ec17a91/benchmark/run.rb#L44-L52
https://github.com/rmosolgo/graphql-ruby/blob/1a21f665554a00c822b8f5212b2900e64ec17a91/benchmark/run.rb#L73-L77
(Those scripts write to stdout, so I write the result to a file, for example
bundle exec rake:benchmark > before_profile.txt.)Hi, just wanted to share an update. I’m working on a new runtime over at #1394. Instead of eagerly
duping all the objects for different fields, it duplicates them only when needed. (Namely, when requested byextras [...]or when fields return promises.) So I’m optimistic that it will improve runtime performance a lot. It’s not quite ready yet though. Feel free to keep an eye on that issue if you’re interested.Thanks for the detailed benchmark! That’s my understanding too, that the sheer number of objects allocated causes a lot of overhead.
Currently I’m pretty focused landing the finishing touches to the built-in authorization stuff, and getting that deployed at GitHub, but after that, my focus will be on improving runtime performance. (It’s also a GitHub priority!)
I’m pretty stumped about how to reduce that, because those objects seem really necessary. You need something to whole the runtime path, for example (which includes list indexes, unlike the static path), and something to hold on to promises while they’re still pending.
My first order of business before hacking on the runtime is to read up a bit, I got a couple of books with my learning budget: Language Implementation Patterns and Essentials of Programming Languages, so I’m hoping some proper book learnin’ will help!
There are some techniques I can think of from programming language optimization, like reusing frames instead of making new ones, and not allocating frames when you can avoid it. I wonder if something like that will come in handy.