elasticsuite: Extremely slow reindexing when using a MariaDb platform.

The issue only occurs when using a mariaDb platform, and I’ve found that it only happens with catalog_product_entity_int reindexing in the catalogsearch_fulltext indexer. Because it works in a loop, the effective sync speeds take over an hour to finish in our production server, meanwhile are done in under a minute in a local machine.

Preconditions

MariaDB: 10.2.26

Magento Version : 2.3.2

ElasticSuite Version : 2.8.1/2.8.3

Steps to reproduce

  1. Run $ bin/magento indexer:reindex catalogsearch_fulltext
  2. If using mariaDb 10.2.26 grab some popcorn

Expected result

  1. Normal EXPLAIN when using a MySql 5.7 system:
id select_type table partitions type possible_keys key key_len ref rows filtered Extra
1 SIMPLE entity NULL range PRIMARY PRIMARY 4 NULL 1000 100.00 Using where; Using index
1 SIMPLE t_default NULL ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 4 rekaubamaja.entity.entity_id 21 39.13 Using index condition
1 SIMPLE t_store NULL eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 magento_table.entity.entity_id,magento_table.t_default.attribute_id,const 1 100.00 NULL

3 rows (0.008 s),

Actual result

  1. Using MariaDb 10.2.* system:
id select_type table type possible_keys key key_len ref rows r_rows filtered r_filtered Extra
1 SIMPLE entity range PRIMARY PRIMARY 4 NULL 1000 1000.00 100.00 100.00 Using where; Using index
1 SIMPLE t_default range CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 NULL 8000 3017.00 0.12 0.10 Using index condition; Using where
1 SIMPLE t_store eq_ref CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID 8 magento_table.entity.entity_id,magento_table.t_default.attribute_id,const 1 0.66 100.00 100.00

3 rows (16.280 s),

Analysis

The problem lies somewhat in MariaDb failing to use the primary key index instead of the unique one in catalog_product_entity_int table, but I think the main issue is in the non optimal SQL query executed.

\Smile\ElasticsuiteCatalog\Model\ResourceModel\Eav\Indexer\Fulltext\Datasource\AbstractAttributeData::getAttributesRawData finds attributes using the entity_id’s, and for some reason selects them from the main table first only to join the entity tables by the entity_id value.

I’ve fixed our problem by ditching the Inner join done here and selecting all Id’s straight from the t_default (catalog_product_entity_int) table.

So instead of doing …

        $select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
            ->joinInner(
                ['t_default' => $tableName],
                new \Zend_Db_Expr("entity.{$linkField} = t_default.{$linkField}"),
                ['attribute_id']
            )
            ->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
            ->where('t_default.store_id=?', 0)
            ->where('t_default.attribute_id IN (?)', $attributeIds)
            ->where("entity.{$entityIdField} IN (?)", $entityIds)
            ->columns(['value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);

… the query works perfectly well written as:

        $select->from(['t_default' => $tableName], [$entityIdField])
            ->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
            ->where('t_default.store_id=?', 0)
            ->where('t_default.attribute_id IN (?)', $attributeIds)
            ->where("t_default.{$entityIdField} IN (?)", $entityIds)
            ->columns(['t_default.attribute_id', 'value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);

I would suggest fixing this query 😃

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 19

Commits related to this issue

Most upvoted comments

Ok so this could probably be integrated into the core if this does help that much with performances.

I’ll prioritize this.

Regards

I recently upraded MariaDB to 10.4 (as it is supported by Magento 2.4.1) and found I had to disable an optimizer_switch or indexation would be very slow when part of the result set was empty. This impacts more than just Elasticsuite (Magento itself also suffers), just figured I’d share what I found.

Rowid_filter is new in MariaDB 10.4, but is about 100 times slower when result set is empty, causing indexation to be 10 times slower in my case (since I use a lot of global attributes, so store view values are empty) set global optimizer_switch='rowid_filter=off';

For my.cnf optimizer_switch=rowid_filter=off

EDIT: Looks like that made it into the Magento documentation!

https://devdocs.magento.com/guides/v2.4/performance-best-practices/configuration.html#indexers