elasticsuite: Extremely slow reindexing when using a MariaDb platform.
The issue only occurs when using a mariaDb platform, and I’ve found that it only happens with catalog_product_entity_int reindexing in the catalogsearch_fulltext indexer.
Because it works in a loop, the effective sync speeds take over an hour to finish in our production server, meanwhile are done in under a minute in a local machine.
Preconditions
MariaDB: 10.2.26
Magento Version : 2.3.2
ElasticSuite Version : 2.8.1/2.8.3
Steps to reproduce
- Run
$ bin/magento indexer:reindex catalogsearch_fulltext - If using mariaDb 10.2.26 grab some popcorn
Expected result
- Normal EXPLAIN when using a MySql 5.7 system:
| id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | entity | NULL | range | PRIMARY | PRIMARY | 4 | NULL | 1000 | 100.00 | Using where; Using index |
| 1 | SIMPLE | t_default | NULL | ref | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 4 | rekaubamaja.entity.entity_id | 21 | 39.13 | Using index condition |
| 1 | SIMPLE | t_store | NULL | eq_ref | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 8 | magento_table.entity.entity_id,magento_table.t_default.attribute_id,const | 1 | 100.00 | NULL |
3 rows (0.008 s),
Actual result
- Using MariaDb 10.2.* system:
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | r_rows | filtered | r_filtered | Extra |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | SIMPLE | entity | range | PRIMARY | PRIMARY | 4 | NULL | 1000 | 1000.00 | 100.00 | 100.00 | Using where; Using index |
| 1 | SIMPLE | t_default | range | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 8 | NULL | 8000 | 3017.00 | 0.12 | 0.10 | Using index condition; Using where |
| 1 | SIMPLE | t_store | eq_ref | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID,CATALOG_PRODUCT_ENTITY_INT_ATTRIBUTE_ID,CATALOG_PRODUCT_ENTITY_INT_STORE_ID,catalog_product_entity_int_entity_id_index | CATALOG_PRODUCT_ENTITY_INT_ENTITY_ID_ATTRIBUTE_ID_STORE_ID | 8 | magento_table.entity.entity_id,magento_table.t_default.attribute_id,const | 1 | 0.66 | 100.00 | 100.00 |
3 rows (16.280 s),
Analysis
The problem lies somewhat in MariaDb failing to use the primary key index instead of the unique one in catalog_product_entity_int table, but I think the main issue is in the non optimal SQL query executed.
\Smile\ElasticsuiteCatalog\Model\ResourceModel\Eav\Indexer\Fulltext\Datasource\AbstractAttributeData::getAttributesRawData finds attributes using the entity_id’s, and for some reason selects them from the main table first only to join the entity tables by the entity_id value.
I’ve fixed our problem by ditching the Inner join done here and selecting all Id’s straight from the t_default (catalog_product_entity_int) table.
So instead of doing …
$select->from(['entity' => $this->getEntityMetaData($this->getEntityTypeId())->getEntityTable()], [$entityIdField])
->joinInner(
['t_default' => $tableName],
new \Zend_Db_Expr("entity.{$linkField} = t_default.{$linkField}"),
['attribute_id']
)
->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
->where('t_default.store_id=?', 0)
->where('t_default.attribute_id IN (?)', $attributeIds)
->where("entity.{$entityIdField} IN (?)", $entityIds)
->columns(['value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);
… the query works perfectly well written as:
$select->from(['t_default' => $tableName], [$entityIdField])
->joinLeft(['t_store' => $tableName], $joinStoreValuesCondition, [])
->where('t_default.store_id=?', 0)
->where('t_default.attribute_id IN (?)', $attributeIds)
->where("t_default.{$entityIdField} IN (?)", $entityIds)
->columns(['t_default.attribute_id', 'value' => new \Zend_Db_Expr('COALESCE(t_store.value, t_default.value)')]);
I would suggest fixing this query 😃
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 19
Commits related to this issue
- Fix #1621 by integrating the provided patch. — committed to romainruaud/elasticsuite by romainruaud a year ago
- Fix #1621 by integrating the provided patch. — committed to romainruaud/elasticsuite by romainruaud a year ago
- Merge pull request #2982 from romainruaud/fix_performance-indexing Fix #1621 indexing performances — committed to Smile-SA/elasticsuite by romainruaud a year ago
Ok so this could probably be integrated into the core if this does help that much with performances.
I’ll prioritize this.
Regards
I recently upraded MariaDB to 10.4 (as it is supported by Magento 2.4.1) and found I had to disable an optimizer_switch or indexation would be very slow when part of the result set was empty. This impacts more than just Elasticsuite (Magento itself also suffers), just figured I’d share what I found.
Rowid_filter is new in MariaDB 10.4, but is about 100 times slower when result set is empty, causing indexation to be 10 times slower in my case (since I use a lot of global attributes, so store view values are empty)
set global optimizer_switch='rowid_filter=off';For my.cnf
optimizer_switch=rowid_filter=offEDIT: Looks like that made it into the Magento documentation!
https://devdocs.magento.com/guides/v2.4/performance-best-practices/configuration.html#indexers