gatsby: New Gatsby type inference is slow on 60k pages

Description

After the onPreExtractQueries step of the build process, Gatsby gets when looking through 60k or so pages. Specifically, the isDate method (which is run on every string to check whether it’s a string or a date string), is taking up the most time. These methods are all being run from the example-value.js.

Steps to reproduce

  1. Clone gatsby-intense-benchmark
  2. Run:
node --inspect-brk node_modules/.bin/gatsby develop
  1. Open chrome / chromium, navigate to chrome://inspect and start the debugger.
  2. When the build process finishes onPreExtractQueries, wait for a few minutes and then pause the debugger (chances are it will be in the right spot)
  3. Alternatively run the profiler for a few minutes and inspect the chart

The build will be stuck in this state for 30+ minutes (I haven’t had a successful build).

Expected result

Build completes in a reasonable time.

Actual result

Build never completes.

Environment

System: OS: Linux 4.15 Ubuntu 18.04.2 LTS (Bionic Beaver) CPU: (4) x64 Intel® Core™ i5-4300U CPU @ 1.90GHz Shell: 5.4.2 - /usr/bin/zsh Binaries: Node: 10.15.0 - ~/.nvm/versions/node/v10.15.0/bin/node Yarn: 1.13.0 - ~/.nvm/versions/node/v10.15.0/bin/yarn npm: 6.4.1 - ~/.nvm/versions/node/v10.15.0/bin/npm Languages: Python: 2.7.15 - /usr/bin/python npmPackages: gatsby: ^2.2.1 => 2.2.1 gatsby-image: ^2.0.34 => 2.0.34 gatsby-plugin-manifest: ^2.0.24 => 2.0.24 gatsby-plugin-offline: ^2.0.25 => 2.0.25 gatsby-plugin-react-helmet: ^3.0.10 => 3.0.10 gatsby-plugin-sharp: ^2.0.29 => 2.0.29 gatsby-source-filesystem: ^2.0.27 => 2.0.27 gatsby-transformer-sharp: ^2.1.17 => 2.1.17 npmGlobalPackages: gatsby-dev-cli: 2.4.12 gatsby: 2.2.2

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 21 (20 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks @wardpeet I’m giving it a shot, hopefully with an answer in less than 2 hours 😃

UPDATE: Previous run took (from memory) 8000s to build schemas, with the patch it was only 600s.

Looking good!

PR #12700 is a quickfix to make it a little bit faster again. Do you mind trying it out?

We will merge Ward’s PR as a temporary optimization. Sadly date-fns parser is a bit too loose (like it allows dates with arbitrary trailing characters), so that would be a potential breaking change.

As a long term solution to this, we will provide an opt-out from inference that will considerably speed up sites with lots of nodes. This will be done by specifying the GraphQL types for the nodes that you don’t want to be inferred. We’ve added the “specifying the type” part already, but for historical reasons inference always happens. We will fix this in the next couple of weeks.

Thanks for the report and for providing a testing repo!

We are indeed now checking every string if it is a date string – we did not do this before but only checked on a randomly picked field value which unfortunately made type inference non-deterministic. We are aware though that this does not scale very well.

I’ll test with the provided repo soon – in the meantime: if it hangs after onPreExtractQueries that means in the schema update step. If that’s also what you experience in your real project, you can try simply disabling schema updating here since it’s really only to make the context fields available in the schema which you probably won’t need (see this RFC) I’d also want to try switching momentjs for something like date-fns@2’ (something like const isDate = value => isValid(parseISO(value)) and see if that would make a difference.