gatsby: Build stuck at running jobs (image transformation)
If you’re coming new to this issue, please see this first: https://github.com/gatsbyjs/gatsby/issues/34051#issuecomment-979882343
Preliminary Checks
- This issue is not a duplicate. Before opening a new issue, please search existing issues: https://github.com/gatsbyjs/gatsby/issues
- This issue is not a question, feature request, RFC, or anything other than a bug report directly related to Gatsby. Please post those things in GitHub Discussions: https://github.com/gatsbyjs/gatsby/discussions
Description
Gatsby’s build process is hanging and not completing. I suspect the issue is with Sharp, as my site has quite a few images, and I saw this brought up in a previous issue, #33557.
When I upgraded to v4 I initially had no issues. However, the next day my builds all started going exceeding Netlify’s maximum build time of 30 minutes.
I mentioned this problem in the thread to the other issue, as others apparently had the same problem where run queries in workers
seems to take longer than expected.
This issue is difficult to reproduce because I think in part it is to do with the scale of my site, which is moderately large and has ~1600 images. There must be something that isn’t quite right in the worker process because my builds on netlify went from roughly taking around 13 or 14 minutes, to exceeding the build limit every time.
To try and diagnose the issue I tried a local build, which while it took a long-ish time, did actually complete
Since @LekoArts suggested that Gatsby Cloud’s build process is better optimised for processing images, I thought I’d give that a go.
After trying out a build in Gatsby Cloud, I had no build problems at all and the whole site build with a clear cache in 7 minutes. OK, I thought, seems like the problem isn’t so much with Gatsby, but in how Netlify is interacting with v4’s worker process.
However, the next push I ran into the problem once again, this time in Gatsby Cloud. The bottom end of Gatsby Cloud’s logs are useful, because they give me a little more information than Netlify:
17:38:38 PM:
info Total nodes: 7987, SitePage nodes: 1695 (use --verbose for breakdown)
17:38:38 PM:
success Checking for changed pages - 0.001s
17:38:38 PM:
success onPreExtractQueries - 0.000s
17:38:38 PM:
success Cleaning up stale page-data - 0.024s
17:38:38 PM:
success createPages - 1.351s
17:38:40 PM:
success extract queries from components - 1.596s
17:38:40 PM:
success write out redirect data - 0.004s
17:38:40 PM:
success onPostBootstrap - 0.046s
17:38:40 PM:
success write out requires - 0.030s
17:38:40 PM:
info bootstrap finished - 48.635s
17:39:15 PM:
warning warn - You have enabled the JIT engine which is currently in preview.
17:39:15 PM:
warning warn - Preview features are not covered by semver, may introduce breaking changes, and can change at any time.
17:39:15 PM:
warning ⠀
17:39:22 PM:
success Building production JavaScript and CSS bundles - 42.093s
17:39:24 PM:
[webpack.cache.PackFileCacheStrategy] Serializing big strings (3319kiB) impacts deserialization performance (consider using Buffer instead and decode when needed)
17:39:24 PM:
[webpack.cache.PackFileCacheStrategy] Serializing big strings (3319kiB) impacts deserialization performance (consider using Buffer instead and decode when needed)
17:39:24 PM:
[webpack.cache.PackFileCacheStrategy] Serializing big strings (3319kiB) impacts deserialization performance (consider using Buffer instead and decode when needed)
17:39:59 PM:
success Building Rendering Engines - 37.719s
17:40:13 PM:
success Building HTML renderer - 13.051s
17:40:13 PM:
success Execute page configs - 0.039s
17:40:15 PM:
success Caching Webpack compilations - 0.001s
17:40:15 PM:
success Validating Rendering Engines - 2.094s
17:40:39 PM:
success run queries in workers - 23.276s - 1662/1662 71.40/s
17:45:38 PM:
warning This is just diagnostic information (enabled by GATSBY_DIAGNOSTIC_STUCK_STATUS_TIMEOUT):
17:45:38 PM:
- Activity "build" of type "hidden" is currently in state "IN_PROGRESS"
17:45:38 PM:
Gatsby is in "IN_PROGRESS" state without any updates for 300.000 seconds. Activities preventing Gatsby from transitioning to idle state:
17:45:38 PM:
Process will be terminated in 1500.000 seconds if nothing will change.
17:45:38 PM:
- Activity "Running jobs v2" of type "hidden" is currently in state "IN_PROGRESS"
18:10:38 PM:
ERROR Terminating the process (due to GATSBY_WATCHDOG_STUCK_STATUS_TIMEOUT):
18:10:38 PM:
- Activity "build" of type "hidden" is currently in state "IN_PROGRESS"
18:10:38 PM:
Gatsby is in "IN_PROGRESS" state without any updates for 1800.000 seconds. Activities preventing Gatsby from transitioning to idle state:
18:10:38 PM:
- Activity "Running jobs v2" of type "hidden" is currently in state "IN_PROGRESS"
The fact that a full, uncached build on Gatsby Cloud can run in 7 minutes, suggests to me that actually the issue isn’t one of scale, but that the worker process is hanging, but only sometimes.
Is it to do with incremental builds? Maybe. I am using the preserved download cache, because as I said my site has quite a few images which are coming from a custom source plugin (which is relatively simple, and contains all the image links from AWS that are passed over to createRemoteFileNode).
To test things out once I had the first timeout on Gatsby Cloud, I tested a manual deploy without clearing the cache. I was hoping the process would hang again so I’d know the issue was with the cache and incremental builds, but alas, it did not. The whole build was completed in 6 minutes. Strangely, the issue does appear to occur on Netlify more frequently than not, and happens more occasionally in Gatsby Cloud. It may be to do with build process resources, because I just signed up to Gatsby Cloud, and so am in the free preview of performance builds.
Are there other diagnostic tools I can use to more closely inspect the build process? How would I be able to see which process is failing or never finishing?
Reproduction Link
I can’t seem to reproduce this error as it is intermittent
Steps to Reproduce
- Attempt to build site with
gatsby build
in either Netlify or Gatsby Cloud - Sometimes, the build never finishes
Expected Result
gatsby build
should eventually finish and build the site
Actual Result
The state run queries in workers
never finishes/moves on to merge worker state
, the build eventually times out and fails.
Environment
My local environment isn't really the issue, builds have failed in both Netlify and Gatsby Cloud with this problem.
However, this is my local env:
System:
OS: macOS Mojave 10.14.6
CPU: (4) x64 Intel(R) Core(TM) i7-4578U CPU @ 3.00GHz
Shell: 3.2.57 - /bin/bash
Binaries:
Node: 16.1.0 - /usr/local/bin/node
npm: 8.1.4 - /usr/local/bin/npm
Languages:
Python: 3.9.5 - /usr/local/opt/python/libexec/bin/python
Browsers:
Chrome: 95.0.4638.69
Firefox: 94.0.1
Safari: 14.1.2
npmPackages:
gatsby: ^4.1.6 => 4.2.0
gatsby-plugin-gdpr-cookies: ^2.0.8 => 2.0.8
gatsby-plugin-image: ^2.1.3 => 2.2.0
gatsby-plugin-loadable-components-ssr: ^4.1.0 => 4.1.0
gatsby-plugin-local-search: ^2.0.1 => 2.0.1
gatsby-plugin-netlify: ^4.0.0-next.0 => 4.0.0-next.0
gatsby-plugin-netlify-cms: ^6.1.0 => 6.2.0
gatsby-plugin-postcss: ^5.1.0 => 5.2.0
gatsby-plugin-react-helmet: ^5.1.0 => 5.2.0
gatsby-plugin-sharp: ^4.1.4 => 4.2.0
gatsby-remark-copy-linked-files: ^5.1.0 => 5.2.0
gatsby-remark-images: ^6.1.4 => 6.2.0
gatsby-remark-relative-images: ^2.0.2 => 2.0.2
gatsby-remark-responsive-iframe: ^5.1.0 => 5.2.0
gatsby-remark-smartypants: ^5.1.0 => 5.2.0
gatsby-source-filesystem: ^4.1.3 => 4.2.0
gatsby-transformer-remark: ^5.1.4 => 5.2.0
gatsby-transformer-sharp: ^4.1.0 => 4.2.0
npmGlobalPackages:
gatsby-cli: 4.2.0
gatsby: 3.5.0
Config Flags
PRESERVE_FILE_DOWNLOAD_CACHE: true
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 19
- Comments: 73 (26 by maintainers)
@LekoArts I have come to the conclusion together with other teammates working on the same project that we cannot even make a proper assessment why this problem occurs and we are not able to detect if it’s a resource problem, a graphQL problem or if it has something to do with gatsby internals.
I see that many people are struggling with this error. Hence, there may be a common drawback of the new version. Therefore, it should be handled with high priority.
For me, this problem happened for a long time, tried directly with Gatsby Cloud to work on a solution, but they basically shrugged to the issue. I could get to the bottom of the problem: it happens on
gatsby-plugin-sharp
, when there are too many images to process. I tried all these variables, I tried to change the underlying code to throttle the image processing etc, and couldn’t get to any point I was happy with.Then I decided to just stick with using Shopify’s CDN instead of processing the images and never looked back. But I know that this is a bummer and not everyone can “disable” image processing.
We’re trying to upgrade to 4.4 from 3 as well and are running into this exact issue - both in Gatsby Cloud https://www.gatsbyjs.com/dashboard/e156da66-cda0-4df5-b3c0-a7fdca6bf65e/sites/43774e74-f15a-4923-b6f7-d215d0ba104b/builds/e82328f4-29ff-44bc-945e-81b886afd8f8/details#rawLogs and locally:
success Running gatsby-plugin-sharp.IMAGE_PROCESSING jobs - 211.318s - 2175/2175 10.29/s ⠋ Merge worker state
ERROR
Assertion failed: all worker queries are not dirty (worker #3)`
Following because I get this problem a lot.
Thanks for providing the URLs. We’ve looked at the builds from @NickBarreto @DennisKraaijeveld and in summary these are the findings:
Adding a comment here that I’m just running a query in GraphiQL to test a few things and when I execute it with
it hangs on
@wardpeet Hello, but what if the site is not even deployed on gatsby cloud? For production the site is deployed on AWS but the builds also fail locally. :<
Any people on Gatsby Cloud who are having this issue that can send me an email at ward@gatsbyjs.com with your user email and site name so we can investigate 🙏
Keeping this issue open for others and I would also like to follow the development of it.
@bkaldor and others finding this issue: I guess there might be different reasons the build stops/stalls and this issue can get messy. I summarized below:
NODE_OPTIONS=--max_old_space_size
andGATSBY_CPU_COUNT
environment variables.Also seeing this 👀
Probably workaround is to add env var:
GATSBY_DISABLE_CACHE_PERSISTENCE=true
Hiya!
This issue has gone quiet. Spooky quiet. 👻
We get a lot of issues, so we currently close issues after 60 days of inactivity. It’s been at least 20 days since the last update here. If we missed this issue or if you want to keep it open, please reply here. As a friendly reminder: the best way to see this issue, or any other, fixed is to open a Pull Request. Check out gatsby.dev/contribute for more information about opening PRs, triaging issues, and contributing!
Thanks for being a part of the Gatsby community! 💪💜
@pieh Thanks! Yeah I was going that route but your snippet really helped:
So apparantly I still had an old Image (image.js) component laying around from an earlier version/iteration which was used in one place and that debug info showed me:
Adding that debug info by default might help a lot of people migrating and running into an issue like this.
I have read the thread but I’m not using Gatsby Cloud.
Currently migrating from v3 to v4 and testing everything on local and now this quite often happens on gatsby build.
Never had issues before, it’s a moderate site, definitely nothing large. Is there a way to get more debug info here what’s going on?
Edit: played around with
NODE_OPTIONS=--max_old_space_size
andGATSBY_CPU_COUNT
on my local 32Gb laptop, but without success.@engineergit I ran into the same issue as @stephzero1 did 👆🏻 up there using Gatsby: 4.19.2, Node: 16.14.2, npm: 8.5.0, macOS: 12.4. Turns out it’s related to
sharp
(this one and this one).What resolved this for me was to simply run
$ brew install vips
(thanks to @lovell 's suggestion).I just ran into this issue with my blog, hosted on Vercel. I found a workaround for Vercel and spotted a couple of things that might be of interest to anyone working on this issue.
Background: I’m writing a plugin that sources my Instagram posts so they can be included in my blog alongside regular markdown posts. A markdown node is created for each Instagram post; the images are downloaded by
gatsby-remark-images-remote
for processing withgatsby-plugin-sharp
. Basically: my site has a ton of images.I develop on macOS (M1) and didn’t encounter this issue until I tried to deploy on Vercel. The error message is similar to those reported above:
Scouring the comments here, I was able to reproduce locally by setting the environment variable
GATSBY_CPU_COUNT=2
. I soon realised that by increasing the value of this environment variable, my site would build just fine again.Surprisingly, the same trick works on Vercel: simply override the default build command (find the Project Settings page then jump down to Build & Development Settings) then my site deploys just fine:
I can’t find much documentation on Vercel’s build environment but I presume it’s a tiny VM running somewhere where
GATSBY_CPU_COUNT
gets set by default to 1 and this is why I don’t typically see this issue on my M1 (because my site actually deploys pretty quickly on Vercel, I suspect it’s not in fact such a tiny VM at all).A couple of observations that may help debug this issue:
GATSBY_CPU_COUNT
needed to get my site to build is proportional to the number of images. Even without overriding the default I was able to deploy my site on Vercel if I simply limited the number of Instagram posts to ~10. I currently set it to 8.gatsby-transformer-remark
to use bothgatsby-remark-images
andgatsby-remark-images-remote
. If I disable one or the other, my site deploys just fine on Vercel regardless of how many images I include.@NickBarreto we fixed the issue, you were seeing on the cloud side.
@LekoArts I run multiple builds through and through for the entire week, only having the same results as @tyhopp. I will add the logging output below. I could not find absolutely no pattern to understand which image or why some images fail the rendering engines validation, as every build I ran, the images were random.
I looked into ways to expose more verbose logging for image processing, and one way that’s available now (since
gatsby-plugin-sharp@2.0.25
) is to run a build with this debug env var:This should allow you to see logs for which image is being processed for sharp and how long each takes, which should reveal if the issue is related to stalling on a particular image. Example log output:
Still investigating how to expose the extra log output from stalled jobs locally, will share in this thread if I can get that working since it should also be helpful.
@buzinas Until this problem is solved I will need to research how this problem occurs and try to either bypass it internally or make a workaround for the configuration that is used.
I am running into the same issue. I also played with the .env variables NODE_OPTIONS / GATSBY_CPU_COUNT but had no luck to get my build running.
I am hosting my application on a ubuntu 20.04 server with dokku/docker, 8G of memory and 4 cores. Also Gatsby Cloud is giving my the same output. Just my M1 Mac builds the page locally without any problems.
Since downgrading to GATSBY V3 it works again.
Getting the same issue as well, sometimes I have to re-run the build 4-5 times until one succeeds (running with
GATSBY_CPU_COUNT=1
on a single-CPU instance).This is still a massive blocker for my team in order for us to move from Gatsby 3 to Gatsby 4. How does one go about getting traction towards this issue, which appears to be a larger impacting issue than just our site?
I am having This same issue. Here is a link to the build failurebuild fail
Thank you very much with the assistance on my query @pieh and great spot on the relative paths 😅 You are probably right on that, though I am curious how the change from V3 to V4 had cause that to become an issue. Let me try the workaround and revert back here.
In the mean time, would the
gatsby-source-directus
imageFile
field as pointed out here potentially cause issues on this as well? I noticed in the migration guide, specifically here that there’s a new way to callcreateRemoteFileNode
and the current source plugin seems to be deprecated.@askibinski if you are hitting this issue locally - could you manually edit a file in your node_modules -
node_modules/gatsby/dist/redux/reducers/queries.js
- if you are on 4.4 line should be line 439And instead of just
Let’s add information on our state that assertion fails on:
This should additionally print information like one below alongside assertion error:
I currently have no idea how we end up in situation like that. Possibly something fails earlier and we swallow/ignore error? Or maybe we have some stale state?
@NickBarreto (and other folks following this issue)
I did publish “canary” (
gatsby@alpha-job-progress
) from the PR branch and we are running that (+ some additionaly debugging code on top) internally with test site that we are able to reproduce the issue (eventually, as it does need multiple runs to eventually reproduce the problem). You can use that canary release yourself, but it really won’t help much for unstucking builds (it will just show additional progress bar in logs tab)So in short, we don’t need more information from you folks (at least about being stuck on image generation in Gatsby Cloud), we already can reproduce and are in process of tracking down the problem and we will post update here once we find the problem, implement a fix and have reasonably high level of confidence that the fix is correct ( we can never be 100% sure due to intermittent nature of the problem )