magento2: Catalog image resizing performance problems

Preconditions

  1. Magento develop branch (1856c28 - January 15 2017)
  2. PHP 7.0.14

Steps to reproduce

  1. Have a shop with a bunch of products with images assigned to them
  2. Run bin/magento catalog:images:resize

Expected result

  1. The command should only produced images for the themes which are currently active in the frontend
  2. The command shouldn’t produce multiple files which are binary exactly the same

Actual result

  1. The command produces files for all the installed themes, even if they aren’t being used in the frontend
  2. The command produced files multiple times which are binary exactly the same

Discussion

While trying to figure out why it takes over 12 hours to run the command bin/magento catalog:image:resize on a very beefy server with a Magento CE 2.1.2 shop with about 6500 products and 8500 images, I found a couple of performance problems in the code:

  1. The command generates resized images for all themes, so if you are not using Magento/blank or Magento/luma on the frontend, you still get images resized for those themes, and they take a lot of time to generate, so this is not good
  2. After fixing nr 1, I looked at the generated files and compared them using a hashing function to see if there were duplicated files in the result. And this turned out to be the case. It was always caused by a difference in the image type (thumbnail, small_image, image, …). All the other parameters (width, height, keep frame, transparency, quality, …) to generate a unique file are correct I think. But I don’t think the distinction on image type is important, but maybe I’m missing something?

Possible solution

  1. I created a PR over here: https://github.com/magento/magento2/pull/8142
  2. In Magento\Catalog\Model\Product\Image::getMiscParams, remove the line: 'image_type' => $this->getDestinationSubdir(),

Result

I ran a very small benchmark using a test shop with 3 products and 5 images in total. Only Magento/blank and Magento/luma themes are installed and only Magento/luma is active in frontend:

  1. Originally: 11 seconds, 165 files are generated. After applying PR: 9 seconds, 135 files are generated
  2. Originally: 11 seconds, 165 files are generated. After removing line mentioned above: 9 seconds, 130 files are generated

Combined result of both optimizations:

  • Originally: 11 seconds, 165 files are generated. After both optimizations: 7 seconds, 105 files are generated

I think this is a significant enough improvement for you guys to at least consider optimizing this 😃

Thanks!

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Reactions: 11
  • Comments: 42 (20 by maintainers)

Most upvoted comments

Before leaving work yesterday I decided to see how long this operation would truly take. As is tradition, Magento2 failed catastrophically after 45 minutes without further details–providing a shallow message about a fart, when really it should be giving me all the gory details about its crap-filled underwear.

Before I could do that, though, it failed multiple times about a swatch_image.jpg path, which derailed me for a while, until I deleted the files under the path and suddenly it was able to progress. At that point I thought the troubles were over, detached my terminal multiplexer and ventured home. The next morning I awoke, feeling positive that I would see immense progress and possibly a disk that ran out of space (which would still be progress!), but instead I learned that the operation failed a mere 45 minutes after leaving, gracing me with its ever-useful error message, “Unsupported image format,” which I’ve referenced before. Why not–at the very least–just log it and continue? WHY do you need to fail catastrophically and die?

This is so comically bad, it deserves its own Jimmy Fallon Thank You Notes bit:

  • “Thank you… Magento… for handling long-running operations so poorly.”
  • “Thank you… Magento… for mysteriously failing on 1 image out of 428233 and not telling me which one.”

@magento-engcom-team

For reference to anyone coming by, I’ll explain the current state of this command based on my latest experience.

I’m using Magento 2.2.7.

The output of the catalog:image:resize command is much better than it used to be. Additionally, it’s clearly not resizing the same images over and over again. These are about the only positive things I can say about it, though.

Here are some of my current numbers:

  • This tool counts 428233 product images in my store.
  • On average, this tool will resize about 2.7 images per second, based on my store. I waited until 323 images hit exactly 2 minutes, so 323/120 = 2.7 images per second.
  • 428233/2.7 = 158604s / 60s = 2643mins / 60mins = 44hours

In other words, it will take 44 hours for this tool to resize all the images in my store for the first time. That’s truly devastating performance. At the very least, this is the first time I’ve discovered an approximate amount of time to run this whole thing, given the two pros mentioned earlier, so my expectations are set (extremely low). Of course, I actually have to run this completely through to see how resilient it is, or whether it can even finish.

Other numbers:

  • I ran this tool until it processed 489 product image. That took 6 minutes, although it was not from an empty cache. Approximage 350 images were already generated, so those were ignored. That explains the speed these numbers may suggest. Rest assured, it’s slower on an empty cache.
  • I killed it at that point, then started it again.
  • It took 3 minutes to hit the 489 product images mark.
  • That’s half the time (assumedly) ensuring that files are not resized over and over. That’s positive.
  • This means after an initial image processing session of ~44 hours, I can expect a rerun to take around ~22 hours. Again, that’s devastating performance. Am I literally expected to wait 22 hours for new product images to appear on the frontend? We have a large merchandising team that adds hundreds to thousands of products a week, depending on the season. For example, we add a single, solitary product to our catalog, then wait 22 hours for its product images to be resized? That is absurd.

More numbers again:

  • I ran the resizer until it had resized 597 product images.
  • That resulted in 15715 thumbnails.
  • That is around 26 thumbnails per product image, if this is how it’s done.
  • What?

I am sure there is a way to bring that number of thumbnails per product down, but I’ve not discovered it, yet.

I should mention that these images are being generated on a 512GB SSD, so the IOPS is quite high.

Does anyone have suggestions to make this better? In its current state, it’s clearly anything but an acceptable way of resizing catalog images.


Edit:

I’d like to note that in both developer and production deploy mode, images are still generated on the fly, if they don’t exist, which is good. The release notes for 2.1.7, and then 2.2.6 both used to state that on-the-fly image generation was removed in favour of using the command line tool to generate images, but it seems the release notes were edited to remove this information. Anyway, as long as on-the-fly image generation continues to be a thing, I won’t care that much. It is immensely inconvenient that Magento2 decided to change the paths to all resized images, though. That means I will no matter what have to run the command at least once.

Any updates on if this is fixed? Magento 2.3 is out now.

Edit: Nope, still running into this issue of duplicate images in cache.

a quick suggestion (i apologize if its redundant). add theme name as an argument to resize images for the theme only. similarly add attribute (small, thumbnail…) as an argument for resize image specific to that attribute only (globally or theme specific). I have not yet tested the resize image command but I believe this will help in performance as it won’t be resizing for all the themes and all the attributes.

Thanks, RT

Cache folder seems somewhat pointless if you are using Cloudflare or other CDN. Can we just disable this product image cache feature and have requests serve a resized image but cache that response with an etag based on the original source file’s checksum?

This is still a persistent issue on M2.3.3. I have a multi-store view with 7239 unique images and it’s estimated 19 hours before the command completes. I’m watching duplicate images be generated.

After upgrading from 2.2.7 -> 2.3.3 all images have to be regenerated due to a change in the way the image hash is generated. This isn’t realistic to do when we upgrade a production environment.

EDIT: I redeployed the MCloud environment and the current est is 5.9 hours… still way too long for an image set this small.

Before I could do that, though, it failed multiple times about a swatch_image.jpg path, which derailed me for a while, until I deleted the files under the path and suddenly it was able to progress. At that point I thought the troubles were over, detached my terminal multiplexer and ventured home. The next morning I awoke, feeling positive that I would see immense progress and possibly a disk that ran out of space (which would still be progress!), but instead I learned that the operation failed a mere 45 minutes after leaving, gracing me with its ever-useful error message, “Unsupported image format,” which I’ve referenced before. Why not–at the very least–just log it and continue? WHY do you need to fail catastrophically and die?

Having run into this before; anyone needing to lint images before resize can use ImageMagicks identify utility command to be sure the process doesn’t abruptly fail due to one bad image.

Find malformed/invalid images recursively with ImageMagick’s identify utility of current working dir display output listing of current file scanned, only non 0 exit status get logged.

find -D rates . -type f \( -name '*.gif' -o -name '*.png' -o -name '*.jpg' -o -name '*.jpeg' \) -print -exec bash -c 'identify "$1" &> /dev/null || echo "$1" >> invalid-imgs.log' none {} \;


EDIT to ignore cache directory use (note the -not -path):

find -D rates * -type f \( -name '*.gif' -o -name '*.png' -o -name '*.jpg' -o -name '*.jpeg' \) -not -path "*cache/*" -print -exec bash -c 'identify "$1" &> /dev/null || echo "$1" >> invalid-imgs.log' none {} \;

In reading though the 2.2.6. release notes, it’s made clear that M2 has moved forward with its image generation tool. My only question is, will on-the-fly image generation still be supported? A 90% performance increase is unimportant when the operation takes days (and has never personally completed ever).

Have the concerns raised in this ticket been addressed? To sum it up, M2 moved forward with their image generation tool, realized it worked terribly, and then walked it back and reverted to on-the-fly image generation again, which is superior.

@magento-engcom-team: can you review my comment above, I still think this ticket needs to be re-opened. Thanks!

@magento-engcom-team: I just retested this on a clean 2.2.0 installation, and can’t see any difference in the results, so this issue is definitely not fixed, please reopen 😃

Evidence for issue number 1 (Magento Blank theme is active on frontend, Luma theme not):

Squeeze in a couple of new lines of code on line 63 of the Product\Image\Cache model:

//                if ($theme->getThemeTitle() == 'Magento Luma') continue;
                echo "executing '{$theme->getThemeTitle()}'\n";
$ rm -R pub/media/catalog/product/cache/*

$ php bin/magento catalog:images:resize
executing 'Magento Blank'
executing 'Magento Luma'
.
Product images resized successfully

$ find pub/media/catalog/product/cache -type f | wc -l
      35

Now change the newly introduced code and uncomment the first line, so only the Magento Blank theme is processed

$ rm -R pub/media/catalog/product/cache/*

$ php bin/magento catalog:images:resize
executing 'Magento Blank'
.
Product images resized successfully

$ find pub/media/catalog/product/cache -type f | wc -l
      29

There are less images being produced when only generating them for one single theme, which should happen by default, since Magento Luma isn’t in use anywhere.

Evidence for issue number 2:

$ php bin/magento catalog:images:resize
executing 'Magento Blank'
.
Product images resized successfully

$ find pub/media/catalog/product/cache -type f | wc -l
      29

$ find pub/media/catalog/product/cache -type f -exec shasum {} \; | sort
0cc30892e82612b64bc8a69a0ede6e977773341a  pub/media/catalog/product/cache/926507dc7f93631a094422215b778fe0/s/c/screen_shot_2017-09-30_at_11.43.47.png
0cc30892e82612b64bc8a69a0ede6e977773341a  pub/media/catalog/product/cache/afad95d7734d2fa6d0a8ba78597182b7/s/c/screen_shot_2017-09-30_at_11.43.47.png
0cc30892e82612b64bc8a69a0ede6e977773341a  pub/media/catalog/product/cache/c687aa7517cf01e65c009f6943c2b1e9/s/c/screen_shot_2017-09-30_at_11.43.47.png
27734c04683faaf56fbab8694783ac85f49af19c  pub/media/catalog/product/cache/cdec6e528c16187a547aea54d9e1d6ee/s/c/screen_shot_2017-09-30_at_11.43.47.png
27734c04683faaf56fbab8694783ac85f49af19c  pub/media/catalog/product/cache/dca4079c45c8bedb9968e3d3d4e45631/s/c/screen_shot_2017-09-30_at_11.43.47.png
387c943f6bf316cadbc7f777c25360a936b86358  pub/media/catalog/product/cache/806d6fa663c29d159ca59727157b4a59/s/c/screen_shot_2017-09-30_at_11.43.47.png
4166ad59674e5e41ebe0d7b321d45749dcd2d717  pub/media/catalog/product/cache/2f067dfaa2eefc9cc6820ffd207e9866/s/c/screen_shot_2017-09-30_at_11.43.47.png
4166ad59674e5e41ebe0d7b321d45749dcd2d717  pub/media/catalog/product/cache/3cf5799449660ed39031217945ace72a/s/c/screen_shot_2017-09-30_at_11.43.47.png
43b15e0154462edc6ca4385c843592bc7ceeb296  pub/media/catalog/product/cache/15dc7e9ba1a6bafcd505d927c7fcfa03/s/c/screen_shot_2017-09-30_at_11.43.47.png
43b15e0154462edc6ca4385c843592bc7ceeb296  pub/media/catalog/product/cache/2b4546e5ba001f3aea4287545d649df0/s/c/screen_shot_2017-09-30_at_11.43.47.png
4a205dba28e0a569c55197f04d2eb7602be07da4  pub/media/catalog/product/cache/914b1ba9268f8c1d0e58a8e7ce614488/s/c/screen_shot_2017-09-30_at_11.43.47.png
597873debbacf71bd76fab47d5c85af492753f46  pub/media/catalog/product/cache/0f831c1845fc143d00d6d1ebc49f446a/s/c/screen_shot_2017-09-30_at_11.43.47.png
60e6e629c454e7747ddef89101cc5d601dfeb924  pub/media/catalog/product/cache/633177f689f3c479eab7d48212fd720b/s/c/screen_shot_2017-09-30_at_11.43.47.png
7e1122d6679e7af404873917104af678a4ecabbb  pub/media/catalog/product/cache/9b0529d63c590f29ded60308ccd979ee/s/c/screen_shot_2017-09-30_at_11.43.47.png
86c6e5ee0dbb48c5f5b4f54b91fb8f74eb608139  pub/media/catalog/product/cache/a2d2345650965cd6042e53fd7d716674/s/c/screen_shot_2017-09-30_at_11.43.47.png
873706d62738fc28e5ad408a3dee8906f0e218a7  pub/media/catalog/product/cache/81ea8665b1d657e2313096e2818a187e/s/c/screen_shot_2017-09-30_at_11.43.47.png
973b0ad14825f1dcf2cd3cb3faff5d33def81510  pub/media/catalog/product/cache/f073062f50e48eb0f0998593e568d857/s/c/screen_shot_2017-09-30_at_11.43.47.png
98f583c4f2a63ac96d3e0a6467905ecfeda9ef8c  pub/media/catalog/product/cache/fd09478435d4f3d9e62d28584118149d/s/c/screen_shot_2017-09-30_at_11.43.47.png
98f583c4f2a63ac96d3e0a6467905ecfeda9ef8c  pub/media/catalog/product/cache/fd4c882ce4b945a790b629f572e4ef93/s/c/screen_shot_2017-09-30_at_11.43.47.png
9a424621957c6949cbb316067eebd0691e71821e  pub/media/catalog/product/cache/6633e7fcc9a7e88021adbe9a2450a512/s/c/screen_shot_2017-09-30_at_11.43.47.png
9cc63a227559842bfbd478a3f32ca5f6a496def8  pub/media/catalog/product/cache/75eed2686e01eb22cb4050b2f40ddf97/s/c/screen_shot_2017-09-30_at_11.43.47.png
ab9b1014c3905a7718ce99c7dcf31020a725849f  pub/media/catalog/product/cache/8a4e709a70e03bf31b178a318a79cf0e/s/c/screen_shot_2017-09-30_at_11.43.47.png
ab9b1014c3905a7718ce99c7dcf31020a725849f  pub/media/catalog/product/cache/ee4ee1fe1bbe32e9b93a354df94c32e2/s/c/screen_shot_2017-09-30_at_11.43.47.png
b3efa42d422f39088d1eeb137b114edd138ed8a3  pub/media/catalog/product/cache/3f695f7dd477cbb47cd99d2622d93108/s/c/screen_shot_2017-09-30_at_11.43.47.png
e361be9aefc08e9d924b07b7e6d5c67126e89984  pub/media/catalog/product/cache/ccf7793e39f95beba8c329ba40e7df07/s/c/screen_shot_2017-09-30_at_11.43.47.png
e361be9aefc08e9d924b07b7e6d5c67126e89984  pub/media/catalog/product/cache/f4a2bc458ca2feecb5750446998dc347/s/c/screen_shot_2017-09-30_at_11.43.47.png
e69edb0a25381abe75d0150da41de13b1c4bebe7  pub/media/catalog/product/cache/2f5bcdd08b6b861f73e29326ee14ef04/s/c/screen_shot_2017-09-30_at_11.43.47.png
e69edb0a25381abe75d0150da41de13b1c4bebe7  pub/media/catalog/product/cache/f485795eb4b45ff97c82d72651274f10/s/c/screen_shot_2017-09-30_at_11.43.47.png
f960c5eb18d957cfeb550b0bed7ae74d35c89155  pub/media/catalog/product/cache/3bb5001b99d4c204f1708e92b30dda97/s/c/screen_shot_2017-09-30_at_11.43.47.png

You can see a bunch of duplicated hashes which isn’t desirable since those images waste a lot of disk space and it takes longer to generate all of those duplicated files.

It looks like the issues brought up in here have been resolved more or less in Magento 2.3.x, so that’s good, but I found another huge problem, for which I’ve created https://github.com/magento/magento2/issues/26796

unfortunately no joke. the change to the hashes was added in 2.3.0. regenerating the image caches took 5+ days for one of our customer with highres raw images… we resorted to hardcode the few hashes now in our customers’ instances just in case. 😦

@0x15f If you on Magento Cloud then you can try use Fastly image optimization

@hostep, thank you for your report. We’ve created internal ticket(s) MAGETWO-80606 to track progress on the issue.

@hostep Thank you for the investigation. Issue reopened for further research.

And yet another update. Manipulating image_type in Magento\Catalog\Model\Product\Image\ParamsBuilder::build which I mentioned above, isn’t the right solution, as image_type is being used in Magento\Catalog\Model\View\Asset\Image::getPlaceHolder to figure out what placeholder to use.

My new proposition is to change this in Magento\Catalog\Model\View\Asset\Image::getMiscPath as follows:

diff --git a/app/code/Magento/Catalog/Model/View/Asset/Image.php b/app/code/Magento/Catalog/Model/View/Asset/Image.php
index 05f7044cbf1..0fd6690224d 100755
--- a/app/code/Magento/Catalog/Model/View/Asset/Image.php
+++ b/app/code/Magento/Catalog/Model/View/Asset/Image.php
@@ -194,7 +194,20 @@ class Image implements LocalInterface
      */
     private function getMiscPath()
     {
-        return $this->encryptor->hash(implode('_', $this->miscParams), Encryptor::HASH_VERSION_MD5);
+        $miscParams = $this->miscParams;
+
+        // since 'image_type' has no influence as to how an image is manipulated (the resulting files are binary the same if all the other params match),
+        // it makes no sense to include it in the hash caluclation
+        // the best solution would be to remove it, but to avoid introducing new backwards incompatible hashes being generated here,
+        // we decided to hardcode 'image_type' to one which was already in use before: 'thumbnail'
+        // if we simply removed it, we would have to re-run `bin/magento catalog:images:resize`, to again generate all new images, and that's just very annoying
+
+        if (isset($miscParams['image_type']))
+        {
+            $miscParams['image_type'] = 'thumbnail';
+        }
+
+        return $this->encryptor->hash(implode('_', $miscParams), Encryptor::HASH_VERSION_MD5);
     }

     /**

This is for Magento 2.1.6, I haven’t checked out the latest develop branch code, to see if this also applies over there.