magento2: On-schedule indexers stuck in "working" status

Preconditions and environment

Description

For “on-schedule” indexes, the indexer_update_all_views cron job runs every minute to work through the backlog of changed entities and update their corresponding indexes accordingly. In Adobe Commerce cloud environments, the system sometimes terminates this cron job when running low on memory. If this happens during an index update, the index gets indefinitely stuck into the “working” status. In those cases, getting the index unstuck requires manual action. The following diagram provides an overview.

image

Magento version

2.4.5

Steps to reproduce

Idea

The easiest way to reproduce this issue is to make the update for a particular index artificially slow by adding a sleep. Then, we can manually run the job and kill it from another terminal while it’s running to freeze it forever. We can do this with any indexer. In the steps below, we use the product price indexer.

Steps

  1. Disable the automatic execution of cron jobs. We will run it manually for more control.
  2. Install n98-magerun2. We will use this tool to run the indexer_update_all_views cron job in isolation.
  3. Set the catalog_product_price indexer mode to schedule.
  4. Make Indexer\Product\Price::execute artificially slow by adding sleep(300);.
  5. Change the price of any product to add it to the backlog of price updates.
  6. Run bin/magento indexer:status catalog_product_price — it should show “x in backlog”.
  7. Run n98-magerun2 sys:cron:run indexer_update_all_views to run the cron job and remember its PID.
  8. Within 300 seconds, from another terminal, kill the above process with kill $PID.
  9. Remove the sleep(300); and re-run steps 5 and 6 to simulate a non-slow, successful index update.

Expected result

The indexer updates the product price index successfully to include the new price.

Actual result

The product price indexer gets frozen, which keeps it from processing any further price changes.

image

Additional information

Logs from “how to reproduce” steps 7 and 8.

# Terminal 2
app@7ab8e61b7445:~/html$ ps axuf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
app        466  0.1  0.0   4100  3280 pts/3    Ss   13:05   0:00 bash
app        473  0.0  0.0   6700  2948 pts/3    R+   13:05   0:00  \_ ps axuf
app        451  0.0  0.0   4100  3448 pts/2    Ss   13:04   0:00 bash
app        462  1.7  0.9 223156 153088 pts/2   S+   13:04   0:00  \_ php vendor/n98/magerun2-dist/n98-magerun2 sys:cron:run indexer_update_all_views
app        406  0.2  0.0  20508 14588 pts/1    Ss+  12:59   0:00 mysql -hdb -umagento -px xxxxx magento
app         45  0.0  0.0   4100  3352 pts/0    Ss+  12:43   0:00 bash
app         24  0.0  0.0   2420   520 ?        Ss   12:43   0:00 sh /var/www/.composer-global/vendor/bin/cache-clean.js --quiet --watch
app         34  0.3  0.4 649828 78948 ?        Sl   12:43   0:05  \_ node /var/www/.composer-global/vendor/mage2tv/magento-cache-clean/bin/cache-clean.js --quiet --watch
app          1  0.0  0.2 236276 36008 ?        Ss   12:43   0:00 php-fpm: master process (/usr/local/etc/php-fpm.conf)
app        399  0.7  0.8 259592 133236 ?       S    12:58   0:03 php-fpm: pool www
app        400  3.1  0.9 282456 156260 ?       S    12:58   0:13 php-fpm: pool www
app        401  0.5  0.5 253720 89976 ?        S    12:59   0:02 php-fpm: pool www
app        402  0.5  0.7 337404 126924 ?       S    12:59   0:01 php-fpm: pool www
app        403  0.5  0.7 327488 127156 ?       S    12:59   0:02 php-fpm: pool www
app        404  0.5  0.8 262552 136764 ?       S    12:59   0:01 php-fpm: pool www
app@7ab8e61b7445:~/html$ kill 462
app@7ab8e61b7445:~/html$ 

# Terminal 1
app@7ab8e61b7445:~/html$ vendor/n98/magerun2-dist/n98-magerun2 sys:cron:run indexer_update_all_views
Run Magento\Indexer\Cron\UpdateMview::execute Terminated

Release note

No response

Triage and priority

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 11
  • Comments: 30 (6 by maintainers)

Most upvoted comments

@Tomasito665: out of curiosity, have you already tried the indexer config setting use_application_lock? See the documentation describing it.

Basically:

  • without the setting, it will use the database to keep the state of an indexer, if a process crashes halfway through its execution, the database never gets the correct status
  • with the setting enabled, it uses the lockmanager of magento, if the process crashes, the lock will be freed and magento will know that it can try to reindex the indexer again on the next cron execution

Maybe this helps in your case?

Hello @Tomasito665,

Thanks for the report and collaboration!

We have tried to reproduce the issue in Magento 2.4-develop instance and the issue is reproducible for us by exact mentioned steps.

Please refer to the screenshots for reference:

Admin Panel

image

Error in Terminal

image

Hence confirming the issue.

Thanks

Hello @santerref,

Have you tried the use_application_lock approach mentioned here #36724 (comment)? This might resolve your issue.

You can go through with the below devdocs URL for the same:

https://developer.adobe.com/commerce/php/development/components/indexing/#using-application-lock-mode-for-reindex-processes

Thanks

Yes, we already use this setting and the issue is still there.

Hi @alexandrosk. I suggest you to try this Adobe Quality Patch https://docs.mktossl.com/docs/commerce-knowledge-base/kb/support-tools/patches/v1-1-33/acsd-51431-indexer-status-is-working.html?lang=en. I tested it on my staging environment and it seems to work, I will deploy it to production next week.

Issue on 2.4.5-p4 cloud Any idea how to use_application_lock set this on .magento.env.yaml file?