jpeg-decoder: jpeg-decoder is slower than libjpeg-turbo

jpeg_decoder::decoder::Decoder::decode_internal seems to take 50% of the decoding time, or over 75% if using Rayon because this part is not parallelized. This part alone takes more time than libjpeg-turbo takes to decode the entire image.

It appears that jpeg-decoder reads one byte at a time from the input stream and executes some complex logic for every byte, e.g. in HuffmanDecoder::read_bits and a number of other functions called from decode_internal. I suspect performing a single large read (a few Kb in size), then using something that lowers to memchr calls to find marker boundaries would be much faster.

Profiled using this file: https://commons.wikimedia.org/wiki/File:Sun_over_Lake_Hawea,_New_Zealand.jpg via image crate, jpeg-decoder v0.1.19

Single-treaded profile: https://share.firefox.dev/30ZTmks Parallel profile: https://share.firefox.dev/3dqzE49

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Comments: 22 (17 by maintainers)

Most upvoted comments

Came across the link to this in zulip, but… For what it’s worth there’s a very good series on how to do bitwise io performantly in compressors on Fabien Giesen’s blog, if you haven’t seen it before:

Sorry if this is old news.

As of version 0.2.6, on a 6200x8200 CMYK image, jpeg-decoder is actually faster than libjpeg-turbo on my 4-core machine!

Without the rayon feature it’s 700ms for jpeg-decoder vs 800ms for libjpeg-turbo. And according to perf it’s only utilizing 1.38 CPU cores, not all 4, so similar gains should be seen on dual-core machines as well.

The rayon feature is not currently usable due to #245, but once it is fixed I expect the decoding time to drop to 600ms.

Even without parallelism jpeg-decoder is within striking distance of libjpeg-turbo: 850ms as opposed to 800ms.

Sure thing, here’s the trace. wasm-jpeg-decoder.json.zip

In what situation would you want to decode a JPEG in wasm ? You would have to ship a large wasm jpeg decoder to your users, that is always going to run slower than the native jpeg decoder in their browser. If you have a project that handles images in wasm, I would suggest handling the image loading and decoding with native browser APIs, and passing only a UInt8Array containing the pixels to your wasm.

Just another data point. I’m using jpeg-decoder via the image crate in a WASM project. I’ve noticed that loading JPEGs is very slow, roughly 200ms to decode a 2048 x 2048 image. Here’s a screenshot of the Chrome profile of a single load, along with the most common functions calls at the bottom.

Screen Shot 2021-02-25 at 11 43 40 AM

It seems like most of the time is spent in color_convert_line_ycbcr. I don’t see that mentioned on the thread, so a different kind of bottleneck for WASM perhaps?

see my pull request that uses simd for this function: https://github.com/image-rs/jpeg-decoder/pull/146