magento2: Partial Catalog Product Price Indexing not removing old price date and is extremely slow if store has many products (above 200k)

Preconditions (*)

  1. Magento 2.4.1 + PHP 7.4.10 + MySQL 8.0.22
  2. At least 200k simple products
  3. Have at least 8 catalog price rules

Steps to reproduce (*)

  1. Set indexes to update by schedule and wait for indexing to complete.
  2. Manually run catalogrule_apply_all cron job. (this is automatically run at 1am)

Expected result (*)

  1. Once the partialUpdateCatalogRuleProductPrice in Magento\CatalogRule\Model\Indexer\PartialIndex.php finishes $this->indexBuilder->reindexFull(); the next code should take little time but the opposite is true.
  2. Once the partial indexing is done rules older than yesterday should be removed from table catalogrule_product_price_replica

Actual result (*)

  1. Rules with rule_date dating back many months are present in catalogrule_product_price_replica essentially causing the next problem even worse over time
  2. The code below from partialUpdateCatalogRuleProductPrice in Magento\CatalogRule\Model\Indexer\PartialIndex.php makes MySQL hang for many hours:
        $this->connection->query(
            $this->connection->insertFromSelect(
                $selectFields,
                $this->resource->getTableName('catalogrule_product_price_replica'),
                [
                    'rule_date',
                    'customer_group_id',
                    'product_id',
                    'rule_price',
                    'website_id',
                    'latest_start_date',
                    'earliest_end_date',
                ],
                AdapterInterface::INSERT_ON_DUPLICATE
            )
        );

Store admin has to also truncate catalogrule_product_price_replica once a while to remove old data.

It is my understanding from MySQL Workbench that the code will cause many table scans and each insert will fire a trigger that will compare the new data. This statement will put a hold on catalog_product_price_cl. This will cause additional cron tasks to get to waiting state and causes additional RAM usage. A workaround is to disable using catalogrule_product_price_replica for partial indexing and perform full price index instead. This is especially true if one is using multithreaded indexing by setting MAGE_INDEXER_THREADS_COUNT.

On our store with 1 million simple products it caused cron to crash every afternoon from many cron threads waiting for catalog_product_price_cl lock.


Please provide Severity assessment for the Issue as Reporter. This information will help during Confirmation and Issue triage processes.

  • Severity: S0 - Affects critical data or functionality and leaves users without workaround.
  • Severity: S1 - Affects critical data or functionality and forces users to employ a workaround.
  • Severity: S2 - Affects non-critical data or functionality and forces users to employ a workaround.
  • Severity: S3 - Affects non-critical data or functionality and does not force users to employ a workaround.
  • Severity: S4 - Affects aesthetics, professional look and feel, “quality” or “usability”.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 4
  • Comments: 19 (8 by maintainers)

Commits related to this issue

Most upvoted comments

This issue is mentioned in two commits: 6e6fff2a87221cd71f0f9e347a1cf35b3460d13f and 6a9f4474fb3a79612799db23c2a1543fc14fbd2f

It seems that the proposed solution is pretty drastic 😄 immagine

@dhorytskyi Can you please provide us a patch for existing versions?

@sdzhepa: can we have some information why this ticket was closed? It it was closed because it was fixed, it would be appreciated if we can have some links to commits that fixed this.

Thanks! 🙂

I can confirm this, just looked at a shop running on Magento 2.3.6-p1, the table catalogrule_product_price_replica contains 78198 records, the oldest date in the rule_date column is the day before we upgraded this shop from 2.1.16 to 2.3.6-p1, so it looks like before 2.3.6 the table got cleaned up, but since the upgrade to 2.3.6, the cleanup no longer happens.

Sounds like quite an important problem, @sidolov, can we have some priority over here?

Thanks @sdzhepa that’s helpful. From what I can tell, deleting all values from the catalogrule_product_price_replica table seems safe.

Confirming I have done this previously and had no issue in practice.

Thanks @sdzhepa that’s helpful. From what I can tell, deleting all values from the catalogrule_product_price_replica table seems safe.

We have a shop (2.3.6) where the catalogrule_product_price_replica is growing at a rate of 10gb every three days. This is a potential infrastructure breaking issue and should be solved in existing versions too, not only in the next release.