magento2: Cron job not running after crashed once

Preconditions (*)

Magento EE 2.2.8 Crontab configured as per the documentation

Steps to reproduce (*)

An error makes the cron job indexer_update_all_views fail once (in my case, database unavailable)

Expected result (*)

Current run should be marked as failed in the table cron_schedule
Next run should run correctly and status be updated at the end in cron_schedule

Actual result (*)

Table cron_schedule is filled with pending jobs, no job for indexer_update_all_views is run (no output in var/log/cron.log, no status update in cron_schedule table.

Logs : var/log/cron.log (last success + the error message)

[2019-05-29 11:25:10] report.INFO: Cron Job indexer_update_all_views is run [] []
[2019-05-29 11:28:03] report.ERROR: Cron Job indexer_update_all_views has an error: SQLSTATE[08S01]: Communication link failure: 1047 WSREP has not yet prepared node for application use, query was: SELECT `mview_state`.* FROM `mview_state` WHERE (`mview_state`.`view_id`='catalog_product_flat'). Statistics: {"sum":0,"count":1,"realmem":0,"emalloc":0,"realmem_start":182714368,"emalloc_start":180313880} [] []

=> And then no more logs about indexer_update_all_views, even if other jobs from the index group run correctly and output success in var/log/cron.log Database recovered a minute after and query was OK

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 2
Comments: 24 (9 by maintainers)

Most upvoted comments

The PR in #28007 should fix this. It clears up things stuck in running and mitigates deadlocks so the cleanups can finish and stop the cron_schedule table growing exponentially.

driskell on Sep 24, 2020

@hostep I misscommunicated. I didn’t mean that I am seeing running jobs switch their status to missed. What I meant was that in my environment at least a lot of missing jobs seem to accumulate and that can also cause the cron to crash or have a deadlock. The code that looks like it’s meant to clean up missed jobs also seems to be not functioning in my case.

Ctucker9233 on Jun 14, 2019

@QuentinFarizonAfrimarket Hi!
Did you try this fix? Issue: https://github.com/magento/magento2/pull/23079/files Fix: https://github.com/magento/magento2/issues/23077
We have the same issue with millions of lines in changelog tables especially catalog_product_flat_cl maybe it is the same for you.
It is stuck in \Magento\Framework\Mview\View::update processing millions of versions in chunck

mattheo-geoffray on Jun 6, 2019

Hello @hostep thank you ! I agree with the conclusion of magento/architecture#171 : cron management must be more resilient, fireproof, and protected against periodic or permanent failing of one of the job codes.

I think I found a reproducible scenario that caused issue on my system :

Indexing job (group “index”) configured as “no separate proces”
Indexing runs out of memory, process is killed by the system (or process crashes violently for another reason)
As process didn’t have the occasion to set the job as “error”, it stays in running
No new indexing job runs (because one is running)
Running jobs are cleaned after max(successLifetime, erroLifetime) which by default is 3 days (!) for group index

Consequences => No error or few errors in cron.log or other logs => No reporting on the back-office or cli, apart from indexer status keeping piling up => Index job marked as “running” for 3 days after a single error

Workaround : => Increase memory limit => Set error lifetime to something between 1 and 6 hours (should be sufficient for indexing)

Ideas : => Store host+pid in database to regularly check for crashed processes (when you’re on the correct host)

QuentinFarizon on May 31, 2019