jpeg-decoder: jpeg-decoder is slower than libjpeg-turbo
jpeg_decoder::decoder::Decoder::decode_internal
seems to take 50% of the decoding time, or over 75% if using Rayon because this part is not parallelized. This part alone takes more time than libjpeg-turbo takes to decode the entire image.
It appears that jpeg-decoder
reads one byte at a time from the input stream and executes some complex logic for every byte, e.g. in HuffmanDecoder::read_bits
and a number of other functions called from decode_internal
. I suspect performing a single large read (a few Kb in size), then using something that lowers to memchr
calls to find marker boundaries would be much faster.
Profiled using this file: https://commons.wikimedia.org/wiki/File:Sun_over_Lake_Hawea,_New_Zealand.jpg via image
crate, jpeg-decoder
v0.1.19
Single-treaded profile: https://share.firefox.dev/30ZTmks Parallel profile: https://share.firefox.dev/3dqzE49
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 22 (17 by maintainers)
Came across the link to this in zulip, but… For what it’s worth there’s a very good series on how to do bitwise io performantly in compressors on Fabien Giesen’s blog, if you haven’t seen it before:
Sorry if this is old news.
As of version 0.2.6, on a 6200x8200 CMYK image,
jpeg-decoder
is actually faster thanlibjpeg-turbo
on my 4-core machine!Without the
rayon
feature it’s 700ms forjpeg-decoder
vs 800ms forlibjpeg-turbo
. And according toperf
it’s only utilizing 1.38 CPU cores, not all 4, so similar gains should be seen on dual-core machines as well.The rayon feature is not currently usable due to #245, but once it is fixed I expect the decoding time to drop to 600ms.
Even without parallelism jpeg-decoder is within striking distance of libjpeg-turbo: 850ms as opposed to 800ms.
Sure thing, here’s the trace. wasm-jpeg-decoder.json.zip
In what situation would you want to decode a JPEG in wasm ? You would have to ship a large wasm jpeg decoder to your users, that is always going to run slower than the native jpeg decoder in their browser. If you have a project that handles images in wasm, I would suggest handling the image loading and decoding with native browser APIs, and passing only a UInt8Array containing the pixels to your wasm.
Just another data point. I’m using jpeg-decoder via the image crate in a WASM project. I’ve noticed that loading JPEGs is very slow, roughly 200ms to decode a 2048 x 2048 image. Here’s a screenshot of the Chrome profile of a single load, along with the most common functions calls at the bottom.
It seems like most of the time is spent in
color_convert_line_ycbcr
. I don’t see that mentioned on the thread, so a different kind of bottleneck for WASM perhaps?see my pull request that uses simd for this function: https://github.com/image-rs/jpeg-decoder/pull/146