magento2: Cron job not running after crashed once

Preconditions (*)

Magento EE 2.2.8 Crontab configured as per the documentation

Steps to reproduce (*)

  • An error makes the cron job indexer_update_all_views fail once (in my case, database unavailable)

Expected result (*)

  • Current run should be marked as failed in the table cron_schedule
  • Next run should run correctly and status be updated at the end in cron_schedule

Actual result (*)

Table cron_schedule is filled with pending jobs, no job for indexer_update_all_views is run (no output in var/log/cron.log, no status update in cron_schedule table.

Logs : var/log/cron.log (last success + the error message)

[2019-05-29 11:25:10] report.INFO: Cron Job indexer_update_all_views is run [] []
[2019-05-29 11:28:03] report.ERROR: Cron Job indexer_update_all_views has an error: SQLSTATE[08S01]: Communication link failure: 1047 WSREP has not yet prepared node for application use, query was: SELECT `mview_state`.* FROM `mview_state` WHERE (`mview_state`.`view_id`='catalog_product_flat'). Statistics: {"sum":0,"count":1,"realmem":0,"emalloc":0,"realmem_start":182714368,"emalloc_start":180313880} [] []

=> And then no more logs about indexer_update_all_views, even if other jobs from the index group run correctly and output success in var/log/cron.log Database recovered a minute after and query was OK

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 24 (9 by maintainers)

Most upvoted comments

The PR in #28007 should fix this. It clears up things stuck in running and mitigates deadlocks so the cleanups can finish and stop the cron_schedule table growing exponentially.

@hostep I misscommunicated. I didn’t mean that I am seeing running jobs switch their status to missed. What I meant was that in my environment at least a lot of missing jobs seem to accumulate and that can also cause the cron to crash or have a deadlock. The code that looks like it’s meant to clean up missed jobs also seems to be not functioning in my case.

@QuentinFarizonAfrimarket Hi!
Did you try this fix? Issue: https://github.com/magento/magento2/pull/23079/files Fix: https://github.com/magento/magento2/issues/23077
We have the same issue with millions of lines in changelog tables especially catalog_product_flat_cl maybe it is the same for you.
It is stuck in \Magento\Framework\Mview\View::update processing millions of versions in chunck

Hello @hostep thank you ! I agree with the conclusion of magento/architecture#171 : cron management must be more resilient, fireproof, and protected against periodic or permanent failing of one of the job codes.

I think I found a reproducible scenario that caused issue on my system :

  • Indexing job (group “index”) configured as “no separate proces”
  • Indexing runs out of memory, process is killed by the system (or process crashes violently for another reason)
  • As process didn’t have the occasion to set the job as “error”, it stays in running
  • No new indexing job runs (because one is running)
  • Running jobs are cleaned after max(successLifetime, erroLifetime) which by default is 3 days (!) for group index

Consequences => No error or few errors in cron.log or other logs => No reporting on the back-office or cli, apart from indexer status keeping piling up => Index job marked as “running” for 3 days after a single error

Workaround : => Increase memory limit => Set error lifetime to something between 1 and 6 hours (should be sufficient for indexing)

Ideas : => Store host+pid in database to regularly check for crashed processes (when you’re on the correct host)