magento2: [2.2.0-*] cron_schedule forever increasing in size. Lots of pending jobs never cleared
Preconditions
- Magento 2.2.0-rc30 running on Ubunti 16.04
- Deployed initially from zip, but updated to 2.2.0-rc30 using composer
Steps to reproduce
- Nothing, just look at the cron_schedule table
Expected result
- On 2.1.9 my cron_schedule table is around 180 items in size. Its size is pretty much static. A snapshot shows the vast majority of jobs are in the success state, with a couple of pending jobs about to be run in the next minute or so.
Actual result
- On 2.2.0-rc30 which has been running for around 8 days (upgraded from previous rc) the size of the cron_schedule table is around 6500 items in size. The size is constantly increasing every minute. The majority of the jobs are in the pending state. Some are marked as success.
The cronjob steadily increases in the time taken to complete, at the moment it is taking around 30 seconds to complete, during which time, mysql and php are taking up heavy CPU usage.
A MYSQL query log shows magento churning through all the pending requests, but they are never marked as success. Hence the ever increasing list of jobs to run.
Snippet from the Mysql Query log below
90 Query START TRANSACTION
90 Query UPDATE "cron_schedule" SET "job_code" = "catalog_product_outdated_price_values_cleanup", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:48:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189337")
90 Query COMMIT
90 Query UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189338") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query START TRANSACTION
90 Query UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:33:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189338")
90 Query COMMIT
90 Query UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189339") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query START TRANSACTION
90 Query UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:34:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189339")
90 Query COMMIT
90 Query UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189340") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query START TRANSACTION
90 Query UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:35:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189340")
90 Query COMMIT
90 Query UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189341") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
90 Query START TRANSACTION
90 Query UPDATE "cron_schedule" SET "job_code" = "catalog_product_frontend_actions_flush", "status" = "pending", "messages" = NULL, "created_at" = "2017-09-15 09:29:06", "scheduled_at" = "2017-09-15 09:36:00", "executed_at" = NULL, "finished_at" = NULL WHERE (schedule_id="189341")
90 Query COMMIT
90 Query UPDATE "cron_schedule" AS "current" LEFT JOIN "cron_schedule" AS "existing" ON existing.job_code = current.job_code AND existing.status = "running" SET "current"."status" = "running" WHERE (current.schedule_id = "189342") AND (current.status = "pending") AND (existing.schedule_id IS NULL)
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 21
- Comments: 99 (40 by maintainers)
@magento-team This is a pathetic bug that is wreaking havoc on our cron. Can you at least respond to this?
@magento-engcom-team
I have created a test module that implements a cron job that crashes during execution.
It can be downloaded from here Gw_CronCrash.zip
This is a simple cron job that just throws an invalid exception, instead of actually doing anything useful. It simulates a cron job crashing for whatever reason during execution.
Once installed, on the next 1 minute boundary, the exception is logged in var/log/magento.cron.log (if you have setup logging of cron output).
Now if you check your cron_schedule table
SELECT * FROM 'cron_schedule' WHERE 'job_code' = 'Gw_CronCrash'
You will see the first row shows the first time it tried to run the job. The job will be in the “running” state, even though the job crashed and failed a long time ago.
Every 15 minutes, a new batch of 15 jobs in the pending state are added to the cron table with status “pending”. No rows are ever removed.
The table will grow forever.
Even if you fix the problem with the code in the module, the cron job for that module is NEVER run again. 1440 rows are added to the cron table every day. Every minute, every row of the cron_schedule table is parsed by Magento. Depending on the capabilities of your machine, your CPU could be maxed out in as little as a week.
The only way out is to manually delete the first entry in the cron_schedule table to remove the “running” job. Magento, then does a nice job of cleaning up the remainder of the “pending” entries, as you would expect.
Could this issue be reopened. It is 100% reproducable with this example module.
@magento-engcom-team Can this issue be re-opened. I, and several others by the looks of it, are having to go into the database on a regular basis to clear out cron jobs that are stuck in the running state.
It seems to be worse on development installs, where cron jobs may be crashing whilst being run and never being marked as completed. Its not the same job every time. Seems to be random which one gets stuck.
If you don’t manually delete the stuck running job, then, on my webserver, the CPU is overwhelmed at 100% within about 3 days as the never ending list of pending jobs in the cron_schedule table increases. If you don’t spot the problem, the first you will know about it, is when your webserver is unresponsive as MySQL is overwhelmed by Magento cycling through thousands of pending cron jobs every minute.
I have been stuck on this same issue for several weeks. I think I finally got it figured out with the help of this post. I made a quick blog post about it so I can refer to it in the future. https://toweringmedia.com/blog/magento-2-cron-job-pending-jobs-never-cleared-solved/
We wrote an extension to fix these bugs, speed up performance, and control the execution of tasks: https://github.com/magemojo/m2-ce-cron
Not wanting to stray off topic but @dwirt If it’s a shared server then beware of putting your mysql password in the command line like that. It can be viewed by anyone while the process is running.
The preferred method is to create a
~/.my.cnf
file and specifymysql magento-db .....
on the command line. The host, username and password will be read from the file automatically.The contents of .my.cnf should be
You can then secure this file with
chmod 600
to keep it safe. You can also add the equivalent section for[mysqldump]
if you use that command in cron jobs for database backups etc.Ok so we’ve analyzed this issue and here is what we found:
The problem appear since you have entry in cron_schedule which has status = running and it will never be finished. Usual reason to that might be
Then in Magento\Cron\Model\ResourceModel\Schedule we have method trySetJobUniqueStatusAtomic which will permanently return false for this particular job_code. That will results in all new cron schedules with the same job_code to be not executed/managed. They won’t be run, they won’t be marked as missed and number of these schedules will permantently grow which lead cron:run to work slower and slower.
Solution which i see is that old schedules with status “running” should be marked as “error”. I think this lifetime could be configured in admin config.
@magento-team Let me know what do you think about this solution so we can start to think about Pull Request.
2.2.1 also it happens.
SELECT
main_table.* FROM
cron_scheduleAS
main_tableWHERE (
status= 'pending')
INSERT INTO
cron_schedule(
job_code,
status,
created_at,
scheduled_at) VALUES ('indexer_update_all_views', 'pending', '2018-01-09 10:17:06', '2018-01-09 10:20:00')
UPDATE
cron_scheduleAS
currentLEFT JOIN
cron_scheduleAS
existingON existing.job_code = current.job_code AND existing.status = 'running' SET
current.
status= 'running' WHERE (current.schedule_id = '1119') AND (current.status = 'pending') AND (existing.schedule_id IS NULL)
UPDATE
cron_scheduleSET
job_code= 'my_custom_job_cide',
status= 'pending',
messages= NULL,
created_at= '2018-01-08 09:42:03',
scheduled_at= '2018-01-08 09:44:00',
executed_at= NULL,
finished_at= NULL WHERE (schedule_id='1127')
Repeat every minute, we have pretty snowball effect now 😉
@magento-engcom-team Please reopen. This is not resolved. reproduced in 2.2.6
SELECT count() FROM
cron_schedule
WHEREstatus
=‘pending’; ±---------+ | count() | ±---------+ | 11888 | ±---------+Hey @Linek, nice one on the plugin however it’s still a core issue with Magento and should be addressed by the developers - this issue has been stagnant for an extremely long time and it’s a critical bug.
I’m sure this could be re-worked into a pull request, if it ever gets approved/looked at…
I can confirm this. Also 11000+ records in the table. One day after I truncated the table. Also 2.2.6
Still a problem in 2.3.4:
Will there be a patch / fix for 2.1.x?
I ran into this issue on a Magento 2.2.2 Commerce site, even though none of my jobs were getting left in a
running
state.history_success_lifetime
value of10080
(7 days). With these two CRONs running every minute, if you have ~70 CRON jobs, you can easily end up with hundreds of thousands ofsuccess
jobs in the DB.use_separate_process
= 1 will all run in parallel. This is not normally a problem—however if thecron_schedule
table has a massive number of records and each of those CRON jobs tries to delete the individual outdatedcron_schedule
entries, you start running into lock wait timeouts.SHOW PROCESSLIST
in MySQL, we would see that multiple processes were trying to delete the same record from the DB at the same time:To workaround this issue, we deceased the number of entries in
cron_schedule
by going into STORES > Configuration > ADVANCED > System and decreasing the “Success History Lifetime” value from “10080” to “1440”. Hopefully the #12497 PR will resolve this issue.@jontesamuelsson No this isn’t fixed in 2.2.3.
#13775 is in the process of being merged which will reduce the server load when the cron_schedule table grows unbounded, but it does not fix the root cause of this issue, which is the cron_schedule table growing in the first place.
I ran with @andrewhowdencom 's approach, but implemented one without any 3rd party dependencies (such as n98-magerun or prometheus)
I created a directory in my magento installation’s var directory called
cron_cleanup
to group everything together … I also am using a standalone php script to interact with the db. Depending on someone’s situation, this might not be acceptable, but it works for me. Also, the actual log that the cron entries create might not server any real purpose as-is for most people, so change them to do what you would like or remove them all together.My
execute.php
file is as such:*make sure the
execute.php
has execution permissionschmod +x /var/www/magento/cron_cleanup/execute.php
This approach is working for me so far…I’m on CE (Open Source), but on EE (Commerce) there’s also the issue of consumers to take into consideration … so if someone’s installation isn’t using RabbitMQ, then the Mysql Queue will still be pounded with consumers. Again, all the implementation details here and approach is fine for me, but maybe not for others…YMMV and change it up to work for you.
Hopefully this helps someone, and thanks @andrewhowdencom and @skukla .
@ericvhileman I does not think that use cron service on only one node is high-available solution. Also, this extension requires to run on unix-like OS and have to access to /proc/ filesystem.
@mpropl You can also try the MageMojo extension. I have used on a couple of installations since recently. It seems to fix these issues as well: https://github.com/magemojo/m2-ce-cron
I can confirm that in 2.2.5, certainly in my situation, that cron jobs that crash during execution are now marked as “error” in cron_schedule instead of sitting in the running state. This means that the cron cleanup code now properly deals with these jobs and successive jobs run as planned. Initial testing shows the race condition that allowed the cron_schedule table to grow infinitely in size is now fixed. Closing this issue as fixed. Phew!
Quick tip for this is described in here: https://alekseon.com/en/blog/post/magento-2-slow-and-cpu-usage-gets-high-this-might-be-the-reason/
And fix itself is here: https://github.com/Alekseon/CleanRunningJobs
Same issue here. Magento 2.2.2 is killing my server. 100% CPU load -> 8 Cores 2.6
I’ve been battling some negative behavior with cron:run lately and I think this thread is describing the root cause of my issue. I started up a new virtual machine with Magento and it was working fine at first. But after a week or 2 the server was super slow and unresponsive to the point of not being usable. I eventually noticed there would 10 or more instances of the cron:run process running simultaneously and hammering MySQL. I had the process set to run every minute, as seems to be the default. I walked back the cron schedule to every 8 minutes and that prevented mutliple cron:run processes from running simultaneously.
I finally found this issue and it lead to counting the rows in cron_schedule, which was 208,046! I ran the query posted above and that brought it down to 252 rows.
My site has just been under light developement, no traffic.
After running the query
now I’ve switched all 3 standard magento cron process - cron:run, setup:cron:run, cron.php - back to running every minute, and all 3 run pretty much instantly in under 5 - 10 seconds or less.
I’m new to magento, so I can’t speak to the cause but i can say anyone that runs will be left with an unusable site, unless they heavily upgrade their hardware. I posted this question to magento.stackexchange.com and another user said they were experiencing the same issue, check out the comments. https://magento.stackexchange.com/questions/201063/should-2-of-the-standard-cron-always-be-running?noredirect=1#comment278625_201063
UPDATE: After adding this query to the crontab after reading about this, it fixed all negative behavior and after switching my 3 magento crons back to running every minute, the cron_schedule table is holding steady at around 1030 - 1050 rows with the delete query deleting about 20 rows every 15 minutes when it runs.
So, I had a lot of pending jobs, deleted manually all of them and everything seems to be ok for day or two, but I started to receive the error like above again. When I have checked, I had a lot of pending entries again at the end of the table, manually deleted them, for now it’s ok
My bad, the table have little bit more that 10 000, but I get error
[Magento\Framework\DB\Adapter\DeadlockException]
SQLSTATE[40001]: Serialization failure: 1213 Deadlock found when trying to get lock; try restarting transaction, query was: DELETE FROM
cron_schedule
WHERE (status = ‘missed’) AND (job_code in (‘currency_rates_update’, ‘backend_clean_cache’, ‘visitor_clean’, ‘catalog_index_refresh_price’, ‘catalog_product_flat_indexer_store_cleanup’, ‘catalog_product_outdated_price_values_cleanup’, ‘catalog_product_frontend_actions_flush’, ‘catalog_product_attribute_value_synchronize’, ‘catalogrule_apply_all’, ‘system_backup’, ‘security_clean_admin_expired_sessions’, ‘security_clean_password_reset_request_event_records’, ‘sales_clean_quotes’, ‘sales_clean_orders’, ‘aggregate_sales_report_order_data’, ‘aggregate_sales_report_invoiced_data’, ‘aggregate_sales_report_refunded_data’, ‘aggregate_sales_report_bestsellers_data’, ‘sales_grid_order_async_insert’, ‘sales_grid_order_invoice_async_insert’, ‘sales_grid_order_shipment_async_insert’, ‘sales_grid_order_creditmemo_async_insert’, ‘sales_send_order_ emails’, ‘sales_send_order_invoice_emails’, ‘sales_send_order_shipment_emails’, ‘sales_send_order_creditmemo_emails’, ‘paypal_fetch_settlement_reports’, ‘outdated_authentication_failures_cleanup’, ‘expired_tokens_cleanup’, ‘newsletter_send_all’, ‘analytics_subscribe’, ‘analytics_update’, ‘analytics_collect_data’, ‘magento_newrelicreporting_cron’, ‘catalog_product_alert’, ‘aggregate_sales_report_coupons_data’, ‘persistent_clear_expired’, ‘captcha_delete_old_attempts’, ‘captcha_delete_expired_images’, ‘aggregate_sales_report_shipment_data’, ‘sitemap_generate’, ‘get_amazon_capture_updates’, ‘get_amazon_authorization_updates’, ‘amazon_payments_process_queued_refunds’, ‘aggregate_sales_report_tax_data’, ‘weltpixel_backend_moduleinfoupdate’, ‘yosto_arp_rule_apply_all’)) AND (created_at < ‘2018-07-22 12:20:02’)What this means? I didn’t have such issues before updating to 2.2.5
After updating from 2.2.3 to 2.2.5 I have the same Cron issue. The table is growing and I don’t know how to proceed.
Referenced here https://github.com/magento/magento2/issues/15190 and here https://github.com/magento/magento2/issues/4978
Any ETA on the update, is it gonna be in version 2.2.6 ?
That snowball effect is crashing a lot of servers worldwide 😉
Workaround to minimize the impact : Run the cron with “flock -w 1 cron.lock php bin/magento cron:run” this cron should be run every minute (it’s very important because the heartbeat is scheduled every minute in magento like this the table will not be flooded exponentially
Hi @Jean-PierreGassin I absolutely agree that it should be fixed in the core however our fix is only workaround. I don’t think that deleting some “running” flow if it’s older than x days is perfect solution. And if it is, then what should be an x …
Yes indeed, we are also still seeing this happening in Magento 2.2.2, cronjobs still can end up in the state
running
forever, which then in turn keeps adding newpending
jobs forever and you end up with > 30.000 jobs in the database, until you manually delete them from the table in the database.@dmanners: I’ve noticed you have a backport open to fix this in Magento 2.1, but I think you should hold of with that until this gets properly fixed in Magento 2.2, I have the feeling (not scientifically proved) that the state of crons in Magento 2.1.11 is actually better then in 2.2.2 at this point. Maybe PR https://github.com/magento/magento2/pull/12497 will fix it for good, but I haven’t tested it yet.
Cron’s are still really a mess and should be fixed rather sooner than later, we have to go into servers a lot lately to fix the state of cronjobs. Even on the Magento Cloud infrastructure we keep running into issues with this.