gatsby: New Gatsby type inference is slow on 60k pages
Description
After the onPreExtractQueries step of the build process, Gatsby gets when looking through 60k or so pages. Specifically, the isDate method (which is run on every string to check whether it’s a string or a date string), is taking up the most time. These methods are all being run from the example-value.js.
Steps to reproduce
- Clone gatsby-intense-benchmark
- Run:
node --inspect-brk node_modules/.bin/gatsby develop
- Open chrome / chromium, navigate to
chrome://inspectand start the debugger. - When the build process finishes
onPreExtractQueries, wait for a few minutes and then pause the debugger (chances are it will be in the right spot) - Alternatively run the profiler for a few minutes and inspect the chart
The build will be stuck in this state for 30+ minutes (I haven’t had a successful build).
Expected result
Build completes in a reasonable time.
Actual result
Build never completes.
Environment
System: OS: Linux 4.15 Ubuntu 18.04.2 LTS (Bionic Beaver) CPU: (4) x64 Intel® Core™ i5-4300U CPU @ 1.90GHz Shell: 5.4.2 - /usr/bin/zsh Binaries: Node: 10.15.0 - ~/.nvm/versions/node/v10.15.0/bin/node Yarn: 1.13.0 - ~/.nvm/versions/node/v10.15.0/bin/yarn npm: 6.4.1 - ~/.nvm/versions/node/v10.15.0/bin/npm Languages: Python: 2.7.15 - /usr/bin/python npmPackages: gatsby: ^2.2.1 => 2.2.1 gatsby-image: ^2.0.34 => 2.0.34 gatsby-plugin-manifest: ^2.0.24 => 2.0.24 gatsby-plugin-offline: ^2.0.25 => 2.0.25 gatsby-plugin-react-helmet: ^3.0.10 => 3.0.10 gatsby-plugin-sharp: ^2.0.29 => 2.0.29 gatsby-source-filesystem: ^2.0.27 => 2.0.27 gatsby-transformer-sharp: ^2.1.17 => 2.1.17 npmGlobalPackages: gatsby-dev-cli: 2.4.12 gatsby: 2.2.2
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 21 (20 by maintainers)
Thanks @wardpeet I’m giving it a shot, hopefully with an answer in less than 2 hours 😃
UPDATE: Previous run took (from memory) 8000s to build schemas, with the patch it was only 600s.
Looking good!
PR #12700 is a quickfix to make it a little bit faster again. Do you mind trying it out?
We will merge Ward’s PR as a temporary optimization. Sadly
date-fnsparser is a bit too loose (like it allows dates with arbitrary trailing characters), so that would be a potential breaking change.As a long term solution to this, we will provide an opt-out from inference that will considerably speed up sites with lots of nodes. This will be done by specifying the GraphQL types for the nodes that you don’t want to be inferred. We’ve added the “specifying the type” part already, but for historical reasons inference always happens. We will fix this in the next couple of weeks.
Thanks for the report and for providing a testing repo!
We are indeed now checking every string if it is a date string – we did not do this before but only checked on a randomly picked field value which unfortunately made type inference non-deterministic. We are aware though that this does not scale very well.
I’ll test with the provided repo soon – in the meantime: if it hangs after
onPreExtractQueriesthat means in the schema update step. If that’s also what you experience in your real project, you can try simply disabling schema updating here since it’s really only to make thecontextfields available in the schema which you probably won’t need (see this RFC) I’d also want to try switchingmomentjsfor something likedate-fns@2’ (something likeconst isDate = value => isValid(parseISO(value))and see if that would make a difference.