sharp: Segfaults when under pressure?
I’m trying to debug a segfault while using Gatsby that is potentially in the realm of using sharp.
(I’m a performance eng for Gatsby)
It seems that sites with lots of images can trigger a segfault and we’re not sure when exactly this happens.
I get this fairly frequently with a simple site that just has many images (~1500 for my machine) on the example site posted (in http://github.com/gatsbyjs/gatsby/pull/6291) as a repro;
Clone https://github.com/jp887/gatsby-issue6291-repro
Update package.json to set all gatsby deps to *
Clear the npm cache (~/.npm)
Clear node_modules, in case that’s relevant
npm install
yarn gen-images 1500 (It triggers with 1500 images for me, this will try to process 9000 images in total)
gatsby build
For me, often it segfaults roughly between 25% and 75%.
I’m on ubuntu (xfce) 19.10 Node v8.16.2 (through nvm)
I went through some of the older Gatsby issues relating to this bug. I am able to reproduce it with .simd(true) and .simd(false), and same for .cache(true) and .cache(false).
Reducing the available memory to node does not seem to make a difference (I set it to 500mb, half of the default, but saw no increase in crash rate). Was also able to repro it running single core.
Please let me know how I can help 😃
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 58 (39 by maintainers)
Commits related to this issue
- Improve thread safety with copy-on-write for metadata #1986 — committed to lovell/sharp by lovell 5 years ago
- block metadata changes on shared images If images are shared (ref count > 1), block changes to the set of metadata items on the image. These can cause crashes in highly threaded programs. See https:... — committed to libvips/libvips by jcupitt 5 years ago
- always copy before exif_update During write, we often call vips__exif_update(). This updates the exif block from the other image metadata prior to save. Always copy the image before calling this. S... — committed to libvips/libvips by jcupitt 5 years ago
- always copy before exif_update During write, we often call vips__exif_update(). This updates the exif block from the other image metadata prior to save. Always copy the image before calling this. S... — committed to libvips/libvips by jcupitt 5 years ago
- Improve thread safety with copy-on-write for metadata See: lovell/sharp#1986. — committed to weserv/images by kleisauke 5 years ago
- Use copy-on-write for metadata in the test suite See lovell/sharp#1986. — committed to kleisauke/net-vips by kleisauke 5 years ago
- find cases where we don't copy before set see https://github.com/lovell/sharp/issues/1986#issuecomment-575709384 — committed to libvips/libvips by jcupitt 4 years ago
- lock for metadata changes Another attempt at fixing crashes on metadata chenage in highly threaded applications. Global lock around set, remove and copy metadata. This is crude, but simple, the perf... — committed to libvips/libvips by jcupitt 4 years ago
- Updated to newer Sharp build to fix segfaults See https://github.com/lovell/sharp/issues/1986 — committed to klehmann/imageresizer-libvips by klehmann 4 years ago
That’s very interesting. I’ll have a look today.
v0.24.0 is now available with a prebuilt libvips v8.9.0.
We’ve now bumped Sharp in Gatsby, affecting the following versions. Thanks again!
I can no longer reproduce this problem when using libvips compiled from the
lock-for-metadata-changesbranch.Brilliant, thanks John, I’ve been unable to reproduce a segfault using the
check-metadata-changesbranch of libvips with the latestmasterbranch of sharp.I think I found what’s making those message before the crash:
https://github.com/libvips/libvips/blob/master/libvips/iofuncs/header.c#L311
If you change that to be:
And try
vips copy x.jpg y.jpgyou see:The same set of errors, so probably the
valueparam tometa_new()is pointing at a lot of zeros.meta_cptakes a shallow copy of the list of metadata items on each input image, then callsmeta_new()for each item. A metadata item is disappearing (probably) some time between the shallow copy of the list and callingmeta_new().This could be caused by an image being unreffed during the copy, but that seems unlikely to me, the chaos would be even worse.
It’s more likely that some other thread is changing the metadata on the input image during the copy. Could this be happening?
libvips images have a list of arbitrary name/value pairs attached to them, and these are inherited as images are processed. They are used for things like ICC profiles and EXIF data, but user code can add extra metadata too if it’s convenient. Image metadata is supposed to be immutable: after the image has been constructed, it’s all frozen and must not be changed or you get bad behaviour like this.
However, at the moment, there’s no lock mechanism to prevent you changing image metadata. How about adding something to enforce this? It could be very simple, eg:
Can confirm, no more segfaults with the
yieldbranch.It seems I can no longer trigger the problem on this branch. 30 or 40 runs on high system load (aka a Zoom call, lol) without signs of trouble. Thank you! (While the repro above was for about 9k thumbs, I was also able to successfully generate 30k thumbnails, back to back, without signs of the problem)
The yield branch that will become the forthcoming sharp v0.25.0 depends on and provides libvips v8.9.1.
sharp v0.25.0 will also include the N-API migration (see #1282) as well as the return of support for 32-bit
node.exeon Windows to allow for easier use with Azure.OK, let’s merge and make 8.9.1.
I had another idea: there’s a left-over int field in
VipsImagecalledLength(don’t ask).How about setting this to 42 in
copy(), then invips_image_set()verifying that it has the value 42. That should find all cases where we’ve forgotten to call copy before modifying metadata.I’ve made a branch with this change and quite a few tests now fail. I’ll investigate this weekend.
Sadly I’ve been able to get https://github.com/jp887/gatsby-issue6291-repro to segfault with the latest sharp v0.24.0 (libvips v8.9.0) too:
I added the following to its
package.jsonto achieve this:sharp v0.23.4 now available with the changes in commit bb15cd9. I suspect the changes to libvips discussed here will also be required to fully deal with this.
(I expect the future sharp v0.24.0 to provide a prebuilt version of a future libvips with these fixes, either v8.8.4 or v8.9.0.)