magento2: [2.2.0-*] cron_schedule forever increasing in size. Lots of pending jobs never cleared

Preconditions

Magento 2.2.0-rc30 running on Ubunti 16.04
Deployed initially from zip, but updated to 2.2.0-rc30 using composer

Steps to reproduce

Nothing, just look at the cron_schedule table

Expected result

On 2.1.9 my cron_schedule table is around 180 items in size. Its size is pretty much static. A snapshot shows the vast majority of jobs are in the success state, with a couple of pending jobs about to be run in the next minute or so.

Actual result

On 2.2.0-rc30 which has been running for around 8 days (upgraded from previous rc) the size of the cron_schedule table is around 6500 items in size. The size is constantly increasing every minute. The majority of the jobs are in the pending state. Some are marked as success.

The cronjob steadily increases in the time taken to complete, at the moment it is taking around 30 seconds to complete, during which time, mysql and php are taking up heavy CPU usage.

A MYSQL query log shows magento churning through all the pending requests, but they are never marked as success. Hence the ever increasing list of jobs to run.

Snippet from the Mysql Query log below

90 Query	START TRANSACTION
90 Query	UPDATE "cron_schedule" SET "job_code" = "catalog_product_outdated_price_values_cleanup", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:48:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189337")
90 Query	COMMIT
90 Query	UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189338") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query	START TRANSACTION
90 Query	UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:33:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189338")
90 Query	COMMIT
90 Query	UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189339") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query	START TRANSACTION
90 Query	UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:34:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189339")
90 Query	COMMIT
90 Query	UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189340") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query	START TRANSACTION
90 Query	UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:35:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189340")
90 Query	COMMIT
90 Query	UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189341") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query	START TRANSACTION
90 Query	UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:36:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189341")
90 Query	COMMIT
90 Query	UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189342") AND (current.status = "pending") AND (existing.schedule_id IS NULL)

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 21
Comments: 99 (40 by maintainers)

Links to this issue

Cron Jobs in cron_schedule

Most upvoted comments

@magento-team This is a pathetic bug that is wreaking havoc on our cron. Can you at least respond to this?

+24

mattdillon100 on Jan 24, 2018

@magento-engcom-team

I have created a test module that implements a cron job that crashes during execution.

It can be downloaded from here Gw_CronCrash.zip

This is a simple cron job that just throws an invalid exception, instead of actually doing anything useful. It simulates a cron job crashing for whatever reason during execution.

Once installed, on the next 1 minute boundary, the exception is logged in var/log/magento.cron.log (if you have setup logging of cron output).

Now if you check your cron_schedule table

SELECT * FROM 'cron_schedule' WHERE 'job_code' = 'Gw_CronCrash'

You will see the first row shows the first time it tried to run the job. The job will be in the “running” state, even though the job crashed and failed a long time ago.

Every 15 minutes, a new batch of 15 jobs in the pending state are added to the cron table with status “pending”. No rows are ever removed.

The table will grow forever.

Even if you fix the problem with the code in the module, the cron job for that module is NEVER run again. 1440 rows are added to the cron table every day. Every minute, every row of the cron_schedule table is parsed by Magento. Depending on the capabilities of your machine, your CPU could be maxed out in as little as a week.

The only way out is to manually delete the first entry in the cron_schedule table to remove the “running” job. Magento, then does a nice job of cleaning up the remainder of the “pending” entries, as you would expect.

Could this issue be reopened. It is 100% reproducable with this example module.

+20

gwharton on Dec 4, 2017

@magento-engcom-team Can this issue be re-opened. I, and several others by the looks of it, are having to go into the database on a regular basis to clear out cron jobs that are stuck in the running state.

It seems to be worse on development installs, where cron jobs may be crashing whilst being run and never being marked as completed. Its not the same job every time. Seems to be random which one gets stuck.

If you don’t manually delete the stuck running job, then, on my webserver, the CPU is overwhelmed at 100% within about 3 days as the never ending list of pending jobs in the cron_schedule table increases. If you don’t spot the problem, the first you will know about it, is when your webserver is unresponsive as MySQL is overwhelmed by Magento cycling through thousands of pending cron jobs every minute.

+11

gwharton on Nov 9, 2017

I have been stuck on this same issue for several weeks. I think I finally got it figured out with the help of this post. I made a quick blog post about it so I can refer to it in the future. https://toweringmedia.com/blog/magento-2-cron-job-pending-jobs-never-cleared-solved/

toweringmedia on Feb 27, 2019

We wrote an extension to fix these bugs, speed up performance, and control the execution of tasks: https://github.com/magemojo/m2-ce-cron

ericvhileman on Mar 17, 2018

Not wanting to stray off topic but @dwirt If it’s a shared server then beware of putting your mysql password in the command line like that. It can be viewed by anyone while the process is running.

The preferred method is to create a ~/.my.cnf file and specify mysql magento-db ..... on the command line. The host, username and password will be read from the file automatically.

The contents of .my.cnf should be

[mysql]
host = <hostname>
user = <username>
password = <password>

You can then secure this file with chmod 600 to keep it safe. You can also add the equivalent section for [mysqldump] if you use that command in cron jobs for database backups etc.

gwharton on Oct 19, 2017

Ok so we’ve analyzed this issue and here is what we found:

The problem appear since you have entry in cron_schedule which has status = running and it will never be finished. Usual reason to that might be

virtual machine terminated
cron process shutdown
computer switch off etc.

Then in Magento\Cron\Model\ResourceModel\Schedule we have method trySetJobUniqueStatusAtomic which will permanently return false for this particular job_code. That will results in all new cron schedules with the same job_code to be not executed/managed. They won’t be run, they won’t be marked as missed and number of these schedules will permantently grow which lead cron:run to work slower and slower.

Solution which i see is that old schedules with status “running” should be marked as “error”. I think this lifetime could be configured in admin config.

@magento-team Let me know what do you think about this solution so we can start to think about Pull Request.

Linek on Jan 26, 2018

2.2.1 also it happens.

Cron selects all “pending” jobs: SELECT main_table.* FROM cron_scheduleASmain_table WHERE (status = 'pending')
Adds some new ones: INSERT INTO cron_schedule (job_code, status, created_at, scheduled_at) VALUES ('indexer_update_all_views', 'pending', '2018-01-09 10:17:06', '2018-01-09 10:20:00')
Each selected record updates to “running” status UPDATE cron_scheduleAScurrentLEFT JOINcron_scheduleASexistingON existing.job_code = current.job_code AND existing.status = 'running' SETcurrent.status = 'running' WHERE (current.schedule_id = '1119') AND (current.status = 'pending') AND (existing.schedule_id IS NULL)
Updates them to “pending” status backward UPDATE cron_scheduleSETjob_code= 'my_custom_job_cide',status= 'pending',messages= NULL,created_at= '2018-01-08 09:42:03',scheduled_at= '2018-01-08 09:44:00',executed_at= NULL,finished_at = NULL WHERE (schedule_id='1127')

Repeat every minute, we have pretty snowball effect now 😉

agata-maksymiuk on Jan 9, 2018

@magento-engcom-team Please reopen. This is not resolved. reproduced in 2.2.6

SELECT count() FROM cron_schedule WHERE status=‘pending’; ±---------+ | count() | ±---------+ | 11888 | ±---------+

acurvers on Oct 15, 2018

Hey @Linek, nice one on the plugin however it’s still a core issue with Magento and should be addressed by the developers - this issue has been stagnant for an extremely long time and it’s a critical bug.

I’m sure this could be re-worked into a pull request, if it ever gets approved/looked at…

Jean-PierreGassin on Mar 7, 2018

I can confirm this. Also 11000+ records in the table. One day after I truncated the table. Also 2.2.6

johnny-longneck on Oct 23, 2018

Still a problem in 2.3.4:

MariaDB [db]> SELECT count(*) FROM cron_schedule WHERE status='pending';
+----------+
| count(*) |
+----------+
|      261 |
+----------+
1 row in set (0.007 sec)

zzvara on Apr 28, 2020

Will there be a patch / fix for 2.1.x?

robfico on Jul 9, 2018

I ran into this issue on a Magento 2.2.2 Commerce site, even though none of my jobs were getting left in a running state.

The default Magento CRON groups have a history_success_lifetime value of 10080 (7 days). With these two CRONs running every minute, if you have ~70 CRON jobs, you can easily end up with hundreds of thousands of success jobs in the DB.
When the CRON runs, all cron groups configured with use_separate_process = 1 will all run in parallel. This is not normally a problem—however if the cron_schedule table has a massive number of records and each of those CRON jobs tries to delete the individual outdated cron_schedule entries, you start running into lock wait timeouts.
We had a production server that was using crazy amounts of CPU/memory when the CRON jobs would run (which would lock up the server):
During the time when the CRON was running, if we ran SHOW PROCESSLIST in MySQL, we would see that multiple processes were trying to delete the same record from the DB at the same time: We would also see these errors during this time:

 [Zend_Db_Statement_Exception]                                                                                                                                                          
 SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction, query was: DELETE FROM `cron_schedule` WHERE (schedule_id='6594032')

During these periods, we noticed that the CRON jobs would get backed up, waiting on the previous ones to finish running:

To workaround this issue, we deceased the number of entries in cron_schedule by going into STORES > Configuration > ADVANCED > System and decreasing the “Success History Lifetime” value from “10080” to “1440”. Hopefully the #12497 PR will resolve this issue.

erikhansen on Mar 27, 2018

@jontesamuelsson No this isn’t fixed in 2.2.3.

#13775 is in the process of being merged which will reduce the server load when the cron_schedule table grows unbounded, but it does not fix the root cause of this issue, which is the cron_schedule table growing in the first place.

gwharton on Mar 3, 2018

I ran with @andrewhowdencom 's approach, but implemented one without any 3rd party dependencies (such as n98-magerun or prometheus)

I created a directory in my magento installation’s var directory called cron_cleanup to group everything together … I also am using a standalone php script to interact with the db. Depending on someone’s situation, this might not be acceptable, but it works for me. Also, the actual log that the cron entries create might not server any real purpose as-is for most people, so change them to do what you would like or remove them all together.

#Workaround while upstream bug #11002 is being resolved
*/5 * * * * systemd-cat --identifier=magento_cron_cleanup -- flock --timeout="900" /run/lock/magento_cron_cleanup.lock  /usr/bin/php /var/www/magento/var/cron_cleanup/execute.php; echo "magento_cron_cleanup_exit_code $?" > /var/www/magento/var/cron_cleanup/issue-11002.cleanup.log

#Execute the Magento scheduled job runner
* * * * * systemd-cat --identifier=magento_system_cron -- flock --timeout="900" /run/lock/magento_system_cron.lock  /usr/bin/php /var/www/magento/bin/magento cron:run; echo "magento_system_cron_exit_code $?" > /var/www/magento/var/cron_cleanup/issue-11002.magentosystemcron.log

My execute.php file is as such:

<?php

require '/var/www/magento/app/bootstrap.php';
$bootstrap = \Magento\Framework\App\Bootstrap::create(BP, $_SERVER);
$objectManager = $bootstrap->getObjectManager();
$resource = $objectManager->get('Magento\Framework\App\ResourceConnection');
$connection = $resource->getConnection();
$tableName = $resource->getTableName('cron_schedule');

$sql = "DELETE FROM {$tableName} WHERE scheduled_at < DATE_SUB(NOW(), INTERVAL 1 HOUR);";
$connection->query($sql);

*make sure the execute.php has execution permissions chmod +x /var/www/magento/cron_cleanup/execute.php

This approach is working for me so far…I’m on CE (Open Source), but on EE (Commerce) there’s also the issue of consumers to take into consideration … so if someone’s installation isn’t using RabbitMQ, then the Mysql Queue will still be pounded with consumers. Again, all the implementation details here and approach is fine for me, but maybe not for others…YMMV and change it up to work for you.

Hopefully this helps someone, and thanks @andrewhowdencom and @skukla .

ryantfowler on Jan 7, 2018

@ericvhileman I does not think that use cron service on only one node is high-available solution. Also, this extension requires to run on unix-like OS and have to access to /proc/ filesystem.

kandy on Aug 29, 2018

@mpropl You can also try the MageMojo extension. I have used on a couple of installations since recently. It seems to fix these issues as well: https://github.com/magemojo/m2-ce-cron

rickschippers on Aug 29, 2018

I can confirm that in 2.2.5, certainly in my situation, that cron jobs that crash during execution are now marked as “error” in cron_schedule instead of sitting in the running state. This means that the cron cleanup code now properly deals with these jobs and successive jobs run as planned. Initial testing shows the race condition that allowed the cron_schedule table to grow infinitely in size is now fixed. Closing this issue as fixed. Phew!

gwharton on Jul 5, 2018

Quick tip for this is described in here: https://alekseon.com/en/blog/post/magento-2-slow-and-cpu-usage-gets-high-this-might-be-the-reason/

And fix itself is here: https://github.com/Alekseon/CleanRunningJobs

Linek on Mar 6, 2018

Same issue here. Magento 2.2.2 is killing my server. 100% CPU load -> 8 Cores 2.6

Lasim on Feb 20, 2018

I’ve been battling some negative behavior with cron:run lately and I think this thread is describing the root cause of my issue. I started up a new virtual machine with Magento and it was working fine at first. But after a week or 2 the server was super slow and unresponsive to the point of not being usable. I eventually noticed there would 10 or more instances of the cron:run process running simultaneously and hammering MySQL. I had the process set to run every minute, as seems to be the default. I walked back the cron schedule to every 8 minutes and that prevented mutliple cron:run processes from running simultaneously.

I finally found this issue and it lead to counting the rows in cron_schedule, which was 208,046! I ran the query posted above and that brought it down to 252 rows.

My site has just been under light developement, no traffic.

After running the query

delete from cron_schedule where scheduled_at < date_sub(now(), interval 1 hour)

now I’ve switched all 3 standard magento cron process - cron:run, setup:cron:run, cron.php - back to running every minute, and all 3 run pretty much instantly in under 5 - 10 seconds or less.

I’m new to magento, so I can’t speak to the cause but i can say anyone that runs will be left with an unusable site, unless they heavily upgrade their hardware. I posted this question to magento.stackexchange.com and another user said they were experiencing the same issue, check out the comments. https://magento.stackexchange.com/questions/201063/should-2-of-the-standard-cron-always-be-running?noredirect=1#comment278625_201063

UPDATE: After adding this query to the crontab after reading about this, it fixed all negative behavior and after switching my 3 magento crons back to running every minute, the cron_schedule table is holding steady at around 1030 - 1050 rows with the delete query deleting about 20 rows every 15 minutes when it runs.

akellberg-zz on Nov 15, 2017

So, I had a lot of pending jobs, deleted manually all of them and everything seems to be ok for day or two, but I started to receive the error like above again. When I have checked, I had a lot of pending entries again at the end of the table, manually deleted them, for now it’s ok

AgentGod on Jul 30, 2018

My bad, the table have little bit more that 10 000, but I get error

[Magento\Framework\DB\Adapter\DeadlockException]

SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction, query was: DELETE FROM cron_schedule WHERE (status = ‘missed’) AND (job_code in (‘currency_rates_update’, ‘backend_clean_cache’, ‘visitor_clean’, ‘catalog_index_refresh_price’, ‘catalog_product_flat_indexer_store_cleanup’, ‘catalog_product_outdated_price_values_cleanup’, ‘catalog_product_frontend_actions_flush’, ‘catalog_product_attribute_value_synchronize’, ‘catalogrule_apply_all’, ‘system_backup’, ‘security_clean_admin_expired_sessions’, ‘security_clean_password_reset_request_event_records’, ‘sales_clean_quotes’, ‘sales_clean_orders’, ‘aggregate_sales_report_order_data’, ‘aggregate_sales_report_invoiced_data’, ‘aggregate_sales_report_refunded_data’, ‘aggregate_sales_report_bestsellers_data’, ‘sales_grid_order_async_insert’, ‘sales_grid_order_invoice_async_insert’, ‘sales_grid_order_shipment_async_insert’, ‘sales_grid_order_creditmemo_async_insert’, ‘sales_send_order_ emails’, ‘sales_send_order_invoice_emails’, ‘sales_send_order_shipment_emails’, ‘sales_send_order_creditmemo_emails’, ‘paypal_fetch_settlement_reports’, ‘outdated_authentication_failures_cleanup’, ‘expired_tokens_cleanup’, ‘newsletter_send_all’, ‘analytics_subscribe’, ‘analytics_update’, ‘analytics_collect_data’, ‘magento_newrelicreporting_cron’, ‘catalog_product_alert’, ‘aggregate_sales_report_coupons_data’, ‘persistent_clear_expired’, ‘captcha_delete_old_attempts’, ‘captcha_delete_expired_images’, ‘aggregate_sales_report_shipment_data’, ‘sitemap_generate’, ‘get_amazon_capture_updates’, ‘get_amazon_authorization_updates’, ‘amazon_payments_process_queued_refunds’, ‘aggregate_sales_report_tax_data’, ‘weltpixel_backend_moduleinfoupdate’, ‘yosto_arp_rule_apply_all’)) AND (created_at < ‘2018-07-22 12:20:02’)

What this means? I didn’t have such issues before updating to 2.2.5

AgentGod on Jul 29, 2018

After updating from 2.2.3 to 2.2.5 I have the same Cron issue. The table is growing and I don’t know how to proceed.

AgentGod on Jul 29, 2018

Referenced here https://github.com/magento/magento2/issues/15190 and here https://github.com/magento/magento2/issues/4978

Any ETA on the update, is it gonna be in version 2.2.6 ?

That snowball effect is crashing a lot of servers worldwide 😉

Workaround to minimize the impact : Run the cron with “flock -w 1 cron.lock php bin/magento cron:run” this cron should be run every minute (it’s very important because the heartbeat is scheduled every minute in magento like this the table will not be flooded exponentially

Tofandel on Jul 4, 2018

Hi @Jean-PierreGassin I absolutely agree that it should be fixed in the core however our fix is only workaround. I don’t think that deleting some “running” flow if it’s older than x days is perfect solution. And if it is, then what should be an x …

Linek on Mar 7, 2018

Yes indeed, we are also still seeing this happening in Magento 2.2.2, cronjobs still can end up in the state running forever, which then in turn keeps adding new pending jobs forever and you end up with > 30.000 jobs in the database, until you manually delete them from the table in the database.

@dmanners: I’ve noticed you have a backport open to fix this in Magento 2.1, but I think you should hold of with that until this gets properly fixed in Magento 2.2, I have the feeling (not scientifically proved) that the state of crons in Magento 2.1.11 is actually better then in 2.2.2 at this point. Maybe PR https://github.com/magento/magento2/pull/12497 will fix it for good, but I haven’t tested it yet.

Cron’s are still really a mess and should be fixed rather sooner than later, we have to go into servers a lot lately to fix the state of cronjobs. Even on the Magento Cloud infrastructure we keep running into issues with this.

hostep on Feb 20, 2018