gatsby: Errors when processing large(ish) numbers of images with gatsby-plugin-sharp

Description

I’m processing around 1500 images with gatsby-plugin-sharp/gatsby-transformer-sharp and GraphQL, to display with gatsby-image as per this page in the docs.

For smaller numbers of images I have had no issues, but now that I’m processing 1500 images the build randomly fails with the following error:

(sharp:104): GLib-GObject-WARNING **: 09:35:11.293: gtype.c:4265: type id '0' is invalid

(sharp:104): GLib-GObject-WARNING **: 09:35:11.293: can't peek value table for type '<invalid>' which is not currently referenced

(sharp:104): GLib-GObject-WARNING **: 09:35:11.293: gvalue.c:188: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation

(sharp:104): GLib-GObject-CRITICAL **: 09:35:11.293: g_value_transform: assertion 'G_IS_VALUE (src_value)' failed
Received 'segmentation fault' signal

This happens during the “Generating image thumbnails” step in the build process and it occurs apparently randomly (sometimes 10% of the way through the images, sometimes at 80%, sometimes not at all). Therefore I do not believe it is caused by a “faulty image”.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 24
  • Comments: 102 (50 by maintainers)

Commits related to this issue

Most upvoted comments

FWIW we worked with @KyleAMathews & the author of sharp to debug this on our internal ~large/~5,000 image gatsby build and the fixes were:

  • The core issue was our build running in CircleCI and CircleCI reporting that our build had ~32 some cores to use (i.e. all of the cores of the entire host machine, instead of our build’s docker container). This confused/made the sharp glibc concurrency buggy (i.e. running 32 threads against an image when there was only 1 physical cpu anyway). This confusing resource detection is a known issue in CircleCI: https://ideas.circleci.com/ideas/CCI-I-578
  • Setting GATSBY_CPU_COUNT=1 in our build environment to ignore the bad cpu count +
  • A fix for gatsby-plugin-sharp to respect GATSBY_CPU_COUNT instead of ignoring it ( https://github.com/gatsbyjs/gatsby/pull/14624)
  • Adding sharp.cache(false) and sharp.simd(false) to our project’s ./gatsby-node.js to further turn off other “too much concurrency” behavior in sharp’s implementation, again due to the misreported cpu/etc. stats from the CircleCI environment.

With these changes, our build went from failing ~80% of the time to stable for a ~week. Props to @lovell for all of this work, I’m just the messenger. Also defer to @KyleAMathews and the other gatsby maintainers for how these lessons learned/etc. can get worked into the maintain gatsby code.

Something else we could do to improve stability is to move Sharp to a child process as if it crashes, we could just restart it.

Ok, update time.

Happy to report that we managed to repro this in a somewhat controlled manor and were able to hand that repro over to the Sharp maintainer. We’re expecting a few improvements in the situation.

Please beware that this issue is a conflation of a bunch of different cases.

  • Some are “simple” cases of version conflicts / sub-dependencies
    • This is difficult to resolve but one way seems to be a “-g” install of sharp
  • Some cases are caused by the binary not getting updated when the node version changed
    • You can fix this by forcing this with npm rebuild sharp
    • Alternatively you can clear the npm cache (~/.npm, or whatever it is for yarn) and removing node_modules and then doing a fresh install.
  • Some cases seem to be platform dependent where there simply is no binary available and you need to build from source ( … good luck)
  • And some cases were caused by Gatsby and as a result exposed a segfault in Sharp. That one, at least, is being addressed in https://github.com/lovell/sharp/issues/1986
    • Although forcing it to run single core should be a viable workaround (please let us know if you can reliably repro a segfault it on a single core). Note that running it on a single core is probably quite detrimental to your build speed 😉
  • Stuff that did not prevent the problem for me;
    • The sharp.simd(false) or sharp.cache(false) hacks, ymmv on simd
    • The problem was not made worse by reducing available memory so adding more memory is not likely to affect it either
    • Adding more cpu power / cores might reduce the frequency of this happening, but not prevent it entirely. This issue does appear to be made more likely on heavy system loads.

Some of the binary-version-mismatch problems are likely to disappear next year, when node 8 LTS gets deprecated and we move to node 10 LTS. Node 10 supports things which allow libraries like Sharp to make the binary part easier. But seeing how we’re not there yet and there’s a large gap between then and now, it’s good to see that we can improve the situation.

Additionally I’ll go through our image queuing process to see whether we can improve the situation on Gatsby’s side. We shouldn’t have really triggered those segfaults and regardless of fixes to Sharp, Gatsby should behave better, which in turn should improve perf a bit at scale.

Thanks everyone for helping to report their workarounds!

No. This can still be reproduced. We’re looking into it at the moment.

It looks like the next release of Sharp (0.25.0, not yet published) will fix these issues. It will require node 10 so Gatsby won’t ship with it until Gatsby can bump the node minver to 10 LTS, but that shouldn’t stop you from being able to use it.

So it looks like that will mitigate this problem. I was able to generate 30k thumbs without signs of the issue (where before the problem would generally repro between 3k and 6k images processed).

Will keep this open until Gatsby can ship the new, fixed, version. But please go ahead and test the "sharp": "lovell/sharp#yield" version of Sharp to confirm this fixes the problem for you.

Sharp has been bumped by https://github.com/gatsbyjs/gatsby/pull/22432 and that means this issue should now be closed. Please report back if you have Sharp related segfaults or see warning messages that have been reported in this thread.

This change has been published in the following packages:

  • gatsby-plugin-manifest@2.3.1
  • gatsby-plugin-sharp@2.5.1
  • gatsby-remark-images-contentful@2.2.1
  • gatsby-source-contentful@2.2.2
  • gatsby-theme-blog-core@1.3.2
  • gatsby-theme-blog@1.4.2
  • gatsby-theme-notes@1.2.2
  • gatsby-transformer-sharp@2.4.1
  • gatsby-transformer-sqip@2.2.1
  • gatsby@2.20.2

For any other Sharp related problems or questions, please open a new issue. Thanks!

Hey @twhitson ,

I reached out to gatsby cloud support and they have found the problem. Some of the plugins that I had on my project were using a outdated version of sharp library, so the quickest solution is to change the following:

upgrade gatsby-plugin-sharp

upgrade gatsby-transformer-sharp

set resolution for sharp to “0.26.3”

To set the resolutions, you can add this entry to your package.json:

“resolutions”: {     “sharp”: “^0.26.3”   }

Hope it helps!

Can confirm after forcing my project to use Sharp on the yield branch, I am no longer experiencing segfaults with image processing. Previously, I would experience them almost every build, multiple times.

THANK YOU SO MUCH! 🥳

Thank you so much @lucashfreitas that was it!

I had to set my resolution directly to 0.27.1 but it worked like a charm. Thanks for your help!

Quick update:

I’ve tried to deploy my websites using Gatsby Cloud, Vercel, and Netlify and all of them complain about the same error. ('segmentation fault' signal). Apparently it’s related to sharp plugin memory consumption.

I have updated all plugins and gatsby to the latest version and it’s really hard to get a build running at the moment. The only place that the build would run fine it’s on my laptop which has 32GB of RAM.

I was wondering if gatsby it’s not suitable for huge projects with a lot of pages and images?

I have already contacted the Gatsby Cloud support and will post updates here!

I’m having the same problem with a local gatsby build if I don’t disable shard’s simd and cache.

[===========                 ]   5.893 s 344/850 40% Generating image thumbnails
/bin/sh: line 1:  5438 Segmentation fault: 11  gatsby build
error Command failed with exit code 139.

gatsby@2.13.73 gatsby-plugin-sharp@2.2.13 gatsby-transformer-sharp@2.2.7

node v10.15.3 on macOS

@grantglidewell Did you put const sharp = require('sharp') sharp.simd(false) sharp.cache(false)

this in any of the API hooks in gatsby-node.js or did you just place this block at the top of the file?

Disabling the use of SIMD instructions solved it for me, but it means modifying node_modules/gatsby-plugin-sharp/process-file.js:30, which isn’t a portable solution for cloud building. Could we make that a configurable option?

@TylerBarnes ohhh the dreaded segmentation fault error, it brings me back to when i was using c language as a development tool. That looks like that somewhere deep down in the execution of that plugin, it’s trying to allocate memory where’s not allowed and it was not deallocated and trying to retrieve something that is not allowed. That is not a gatsby plugin issue per se, more like the underlying vips/sharp, the one written in c.

Had the same issue. Doing a manual npm install git+https://github.com/lovell/sharp.git to install the latest unreleased Sharp package seems to have fixed it for me. This might not be a good idea, however I’m not sure what else to do until a fixed version of Sharp is released.

The issue is still happening with the latest version of gatsby sharp: "2.10.0".

My website has a lot of pages/images and it’s seems impossible to get a build done using netlify. If I trigger the build using containers inside AWS or my local computer it’s working fine, but not inside Netlify.

Anyone facing the same problem when trying to deploy/build large websites/ lots of images on netlify?

I am still having occasional segfaults using gatsby-transformer-sharp@2.4.1

$ gatsby build
success open and validate gatsby-configs - 0.039s
/bin/sh: lscpu: not found
/bin/sh: lscpu: not found
success load plugins - 0.505s
success onPreInit - 0.007s
success delete html and css files from previous builds - 0.009s
success initialize cache - 0.006s
success copy gatsby files - 0.059s
success onPreBootstrap - 0.007s
/bin/sh: lscpu: not found
success createSchemaCustomization - 0.005s
success source and transform nodes - 0.103s
success building schema - 0.271s
success createPages - 0.000s
success createPagesStatefully - 0.091s
success updating schema - 0.028s
success onPreExtractQueries - 0.001s
success extract queries from components - 0.388s
success write out redirect data - 0.001s
success Build manifest and related icons - 0.320s
success onPostBootstrap - 0.322s
info bootstrap finished - 4.358s
success run static queries - 2.702s - 6/6 2.22/s
info Visit https://yarnpkg.com/en/docs/cli/run for documentation about this command.
error Command failed with signal "SIGSEGV".
error Command failed.
Exit code: 1
Command: /usr/local/bin/node
Arguments: /opt/yarn-v1.22.4/lib/cli.js build
Directory: /builds/xxx
Output:
info Visit https://yarnpkg.com/en/docs/cli/workspace for documentation about this command.

I can also confirm that the latest version of Sharp is finally solving my build problems! By the way, 0.25.0 is now published, as well as 0.25.1.

So grateful for this fix! Thanks for your hard work @pvdz!

Still experiencing this issue as of "gatsby-plugin-sharp": "2.4.4" (just upgraded from 2.3.3) & "gatsby-transformer-sharp": "2.3.13"

300-ish images processed into 1800 thumbnails of different sizes will usually fail once or twice (sometimes more) before succeeding.

Screen Shot 2020-01-30 at 9 42 08 PM

It always succeeds eventually. So I just "npm run build || npm run build || npm run build --verbose" for now lol. Thanks for looking into this, I have been experiencing this issue on my project since at least Nov 2019. 😦

I’m wondering if people still see this issue on gatsby-plugin-sharp@2.4.3. We have reduced memory usage of the sharp plugin itself.

@ehannes no, I was able to repro in v10 as well. The comment is about the nodejs EOL cycle. In 2020 nodejs v8 reached end of life (“EOL”), meaning it won’t get any more patches and fixes and people should move to node 10, which is the next long term support (“LTS”) release version. So this month many packages are bumping their minimum required nodejs version to >=10, sharp being no exception. Gatsby will do so as well but we’re taking a little longer to allow people to make the transition outside of the busy December month that it would otherwise have been. Hence, currently we couldn’t bump the sharp version because then people on node8 (which we claim is the minimum version we support) would get node version errors when installing. (This can be worked around, I don’t think we need to).

That said, the new sharp version should be more stable as a few bugs related to this issue were found and fixed. Just, apparently, not all of them.

@ehannes it so happens I have! 😃 Was testing yesterday and could still repro with the then newest version. But (in response?) there has been a new release with a new binary (-> https://github.com/lovell/sharp/issues/1986 ) so I’ll be testing that today. If all goes well we should be able to upgrade and resolve some-if-not-all of these issues soon.

I’ve just written there, and now I’m gonna try downgrading to 0.22.1 to see if that works (thanks!)

*Without luck so far, updating here https://github.com/gatsbyjs/gatsby/issues/16985#issuecomment-554767707

Downgrading to gatsby-plugin-sharp@2.2.14 fixes the issue.

per: https://github.com/gatsbyjs/gatsby/issues/16957#issuecomment-523981633

Thank you @MaximeHeckel I was having the same problem with local builds. gatsby build seemed to exit before all of the images were packaged (but gatsby develop worked fine). Upgrading to “gatsby-plugin-sharp”: “^2.2.10” seems to have fixed this problem for me.

This might be helpful…

The segfault happens to me when running the gatsby build in an alpine based container, but not in a debian based container.

To reproduce, try use this docker file based on alpine, and then this other dockerfile using debian. The error never happens on the debian based. Notice that those two, are both base on official node images.

This probably means netflify is using an alpine version. Or another container with a similar problem.

It’s a hard nut to crack… You could try setting the env variable GATSBY_CPU_COUNT=logical_cores. I’m unsure if this even helps but might be worth the try

Still happens to me randomly (both locally and on Netlify, about 20-30% of the time). I have about 5,000 not that big images (mostly 500x500 pixels).

4:25:52 PM: (sharp:1393): GLib-GObject-WARNING **: 16:25:52.645: gtype.c:4265: type id '0' is invalid
4:25:52 PM: (sharp:1393): GLib-GObject-WARNING **: 16:25:52.645: can't peek value table for type '<invalid>' which is not currently referenced
4:25:52 PM: (sharp:1393): GLib-GObject-WARNING **: 16:25:52.645: gvalue.c:188: cannot initialize GValue with type '(null)', this type has no GTypeValueTable implementation
4:25:57 PM: /usr/local/bin/build: line 34:  1393 Segmentation fault      (core dumped) gatsby build

Just curious, how should I go about deploying a site with an insane number of images?

Thanks @TylerBarnes not sure how I missed that. It worked 🎉 - I still get the occasional build failing locally but this is the first successful build I have had with Netlify for this number of images.

I have seen a few suggestions of having an automatic retry in gatsby-plugin-sharp on sharp error. I’d hope that would resolve any remaining issues of builds failing in this case

@mdornseif @TylerBarnes Thanks for the discovery! I went ahead and opened a PR, removing the lines: https://github.com/gatsbyjs/gatsby/pull/11925 😌

Regarding the segfaults I have found this in the gatsby-plugin-sharp:

// Try to enable the use of SIMD instructions. Seems to provide a smallish
// speedup on resizing heavy loads (~10%). Sharp disables this feature by
// default as there's been problems with segfaulting in the past but we'll be
// adventurous and see what happens with it on.
sharp.simd(true);

So far locally I haven’t seen any segfaults after commenting out sharp.simd(true);.

I can confirm this bug. In my case it started happening, when I added some large images, 5mb or more.