image-png: Very slow decoding of paletted PNG images

zune-png benchmarks show that the png crate is much slower than other decoders on indexed images - a whopping 3x slower than zune-png.

CPU profile shows that 71% of the time is spent in png::utils::unpack_bits, specifically this hot loop.

The code is full of indexing and doesn’t look amenable to autovectorization. I think the entire function will have to be rewritten; it’s probably a good idea to copy zune-png here.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 19 (19 by maintainers)

Most upvoted comments

It’s the bit unpacking hot loop that’s slowing things down.

That code was changed in #405 and looks like this now.

The distinction I’d make is that it’s not the bit-unpacking as much as retrieving colors from the lookup table and filling those into the buffer. Trying to do it in 2 passes ended up slower than the current behavior of unpacking and then doing the lookup in 1 pass. The code didn’t seem to get autovectorized even using chunks_exact at the time.

I’m not sure that it’s straightforward or easy to get similar results without larger architectural changes as mentioned in https://github.com/image-rs/image-png/discussions/416#discussioncomment-7590601 where different several methods were tried unsuccessfully.

It’s the bit unpacking hot loop that’s slowing things down. zune-png has already figured out a much more performant way to do it that gets auto-vectorized: https://github.com/etemesi254/zune-image/blob/54cc956ccc01ea942456c0dcebf8d97bda614666/crates/zune-png/src/utils.rs#L217-L317

It should be fairly straightforward to integrate into the png crate.