image-png: Very slow decoding of paletted PNG images
zune-png
benchmarks show that the png
crate is much slower than other decoders on indexed images - a whopping 3x slower than zune-png
.
CPU profile shows that 71% of the time is spent in png::utils::unpack_bits
, specifically this hot loop.
The code is full of indexing and doesn’t look amenable to autovectorization. I think the entire function will have to be rewritten; it’s probably a good idea to copy zune-png
here.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 19 (19 by maintainers)
That code was changed in #405 and looks like this now.
The distinction I’d make is that it’s not the bit-unpacking as much as retrieving colors from the lookup table and filling those into the buffer. Trying to do it in 2 passes ended up slower than the current behavior of unpacking and then doing the lookup in 1 pass. The code didn’t seem to get autovectorized even using
chunks_exact
at the time.I’m not sure that it’s straightforward or easy to get similar results without larger architectural changes as mentioned in https://github.com/image-rs/image-png/discussions/416#discussioncomment-7590601 where different several methods were tried unsuccessfully.
It’s the bit unpacking hot loop that’s slowing things down.
zune-png
has already figured out a much more performant way to do it that gets auto-vectorized: https://github.com/etemesi254/zune-image/blob/54cc956ccc01ea942456c0dcebf8d97bda614666/crates/zune-png/src/utils.rs#L217-L317It should be fairly straightforward to integrate into the
png
crate.