guetzli: Extremely slow performance

How long does it take for you to compress a couple of images?

I tried compressing a 7,8MB JPG with --quality 84 and it took nearly 20 minutes.

I also tried a 1,4MB JPG with --quality 85 and it took nearly 10 minutes.

I must assume that this is not normal - is something wrong with my binary?

I am on Ubuntu 16.04 LTS, intel core i7-4790K CPU @ 4.00GHz I installed gflag via sudo apt-get install libgflags-dev and got libpng via sudo apt-get install libpng16-dev. After that I make with no errors.

convert -quality 85 src.jpg dst.jpg runs in under 1 second, if that is any help.

Anyone else experience this?

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 10
  • Comments: 43 (7 by maintainers)

Most upvoted comments

I just profiled Gueztli and most of the time is spent in the butteraugli Convolution() and ButteraugliBlockDiff() methods. One of the big issues hurting the performance is the use of double-precision floating point values to calculate pixel errors. In this case, a 64-bit integer would provide the same accuracy for the error and increase the speed quite a bit since the original pixels could be left as-is. In certain cases, using doubles for pixels makes sense (e.g. some filter, scaling or transparency operations), but not for error calculations. The rest of the code has some efficiency problems, but won’t affect the performance nearly as much.

@pornel We didn’t try to optimize Guetzli in ways that could make it harder to modify. That means that there’s likely both some speedup available by just optimizing single routines and, more significantly, speedup available by restructuring parts of Guetzli (e.g. attempting to reuse more computation results between iterations).

That said, I believe that much more can be done for memory consumption, which we didn’t optimize nearly at all.

Would it be an option to invoke the binary multiple times in parallel?

Functionally, you can do this with GNU parallel by invoking it like this:

parallel 'guetzli --quality 84 {} {.}.jpg' ::: *.png

Test it yourself:

wget https://github.com/google/guetzli/releases/download/v0/bees.png
for i in 1 2 3 4 5 6 7; do cp bees.png $i.png; done
time parallel 'guetzli --quality 84 {} {.}.jpg' ::: *.png

Hi folks, if someone is still needed guetzli windows binaries with CUDA support, please check this out . This results 25-40 times faster recompression.

@clouless (img-width * img-height) / 1000000 = X megapixels

@robryk I presume large part of the slowness and memory use is because it’s the first release and Guetzli hasn’t been optimized yet. How much of the slowness is inherent to the algorithm and unavoidable, and how much can be done to improve the speed?

I’d love to implement this in my app, but the current performance figures are definitely a roadblock. Taking 13 minutes for a reasonably sized jpeg of a couple MB is simply too long to be practical in many applications.

From my perspective after reading all the current issues, the roadblocks to wide adoption are three and should be prioritized like this:

  1. Faster performance.
  2. Lower memory consumption.
  3. Failures on certain “non-standard” jpegs like those produced by certain cameras (you said you knew what the problem is here)

I think a good, rough goal would be to get to a point where a JPEG that’s a couple MB in size takes no more than 10-12 seconds to optimize. That would make the algorithm practical in my use case, which is an app that optimizes hundreds of images at once as part of building websites.

Guetzli was a proof-of-concept milestone for us in creating new solutions for JPEG XL.

I’m considering of creating “Guetzli 2.0” that runs only one iteration of butteraugli and using the butteraugli from https://gitlab.com/wg1/jpeg-xl/-/tree/master/jxl/butteraugli and initialization code from https://gitlab.com/wg1/jpeg-xl/-/tree/master/jxl/enc_adaptive_quantization.cc

I suspect that would make Guetzli around 100x faster.

Not sure if related, but at our company, we chose to apply the Guetzli algorithm on all our rendered images. Because it’s relative slow, we decided to distribute the load in a special way. You can read all about it here: https://techlab.bol.com/from-a-crazy-hackathon-idea-to-an-empty-queue/

Another alternative to what @graysky2 said is https://github.com/fd0/machma

@DanielBiegler If you have 200 pictures, I’d echo @jan-wassenberg’s suggestion: run multiple instances of Guetzli and thus process multiple pictures in parallel. This will be more effective parallelization than anything that can be done inside Guetzli.

When using the --verbose option, it would be great if an estimated time/memory consumption could be calculated and presented to the user. Perhaps calculating the megapixel count with the current estimated time/memory allocation.

The speed of Guetzli can be improved using OpenCL probably (on the GPU)