meilisearch: Problem indexing various indices

Hi

Describe the bug I tried to index 55 index from customers where I work, the 55 most searched index from one of our ES clusters, they have various schemas, length of index, size of documents, etc… The problem is that meilisearch get stucks, it stop responding and also the web gui, seems like a bug somewhere.

To Reproduce I used the next docker-compose for meilisearch

version: "3.9"
   
services:
  es56:
    image: getmeili/meilisearch
    ports:
      - 7700:7700
    command: ['./meilisearch', '--http-addr', '0.0.0.0:7700', '--http-payload-size-limit', '1000000000', '--no-sentry', '1']
    # command: ['./meilisearch', '--env', 'production', '--http-addr', '0.0.0.0:7700', '--http-payload-size-limit', '1000000000', '--no-sentry', '1', '--master-key', '12341234']

networks:
  default:
    external:
      name: apps

Python script used to index files

import logging
import json
import gzip
import os
from argparse import ArgumentParser
from contextlib import contextmanager

import meilisearch


@contextmanager
def open_file(filename, mode):
    if filename.endswith('.gz'):
        with gzip.GzipFile(filename, mode) as _file:
            yield _file
    else:
        with open(filename, mode) as _file:
            yield _file


def import_to_meilisearch(filesource: str, meili_server: str, indexname: str):
    client = meilisearch.Client(meili_server, '12341234')

    # An index is where the documents are stored.
    index = client.index(indexname)

    with open_file(filesource, 'rb') as _file:
        items_list = json.load(_file)

    count = 0
    chunk_size = 100
    chunk = []
    for item in items_list:
        count += 1
        chunk.append(item)

        if count % chunk_size == 0:
            logging.info(
                f'Processed {count} items of {indexname}, {len(chunk)}')

        # to make sure al prices are float, if needed.
        # for key in item.keys():
        #     if key.endswith('price'):
        #         try:
        #             item[key] = float(item[key])
        #             print(item[key])
        #         except Exception:
        #             logging.exception('error parsing')
        #             item[key] = 0.0
        #             print(item[key])

        if len(chunk) == chunk_size:
            index.add_documents(chunk)
            chunk = []

    if len(chunk) > 0:
        index.add_documents(chunk)
    logging.info(f'inserted {count} index')


def get_args():
    parser = ArgumentParser()
    parser.add_argument(
        'filesfolder', type=str, help='feed or path to import.')
    parser.add_argument(
        'meili-server', type=str, help='MeiliSearch server target')
    return parser.parse_args()


def start(args: ArgumentParser):
    if args.folder:
        filenames = os.listdir(args.filesfolder)

        for filename in filenames:
            _prefix, indexname = filename.split('_')
            # all files are in gzip...
            indexname = indexname.strip('.json.gz')
            filename = f'{args.file.strip("/")}/{filename}'

            import_to_meilisearch(filename, args.meili_server, indexname)


def main():
    logging.basicConfig(level=logging.INFO)

    args = get_args()
    start(args)


main()

Of course I can’t give the files, but maybe getting various public datasets it can be reproduced. json file structures: [{"foo": "bar", ...} {"bar": "baz", ...}]

Expected behavior To index all data ok

Desktop (please complete the following information):

  • OS: Ubuntu 18, Docker version 20.10.2, build 2291f61
  • docker meili latest image (0.18)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (6 by maintainers)

Most upvoted comments

Should I start to learn how to debug a Rust app myself ? haha

This issue is also reproducible easily with the macOS version (v0.20.0) installed with brew install meilisearch

  1. brew install meilisearch
  2. launch meilisearch ->> http://127.0.0.1:7700
  3. Load the test app : https://jsfiddle.net/fxi/zc1jqtkd/5
  4. Configure the host as http://127.0.0.1:7700 ( key field can be empty if no key is set )
  5. Click on update : after two indexes, it’s stuck forever

Same thing happens with

  • docker service inside a digital ocean droplet. e.g. https://meilirocks.fxi.io (which will be stuck forever if synonyms are updated … ) NOTE: If you provide a RSA key, I can give you access to this test server.
  • empty debian docker image with docker binary from your releases. So, non-compiled by me.

Thanks 😃

Here is a picture my cat, which should bring you joy :

Using v0.21.0rc1 we’re no longer able to reproduce the issue from v0.20 👍

Thanks!

Hello !

Thanks 😃

It seems to work well using the docker image and the test script

docker run --rm -e MEILI_MASTER_KEY=thankscurquiza -p 7700:7700 getmeili/meilisearch:v0.21.0rc1 ./meilisearch

meilisearch_0 21 0rc1

Hello @montaniasystemab @volkv @fxi @sonic182 and everyone following the issue!

The RC of the v0.21.0 is out! https://github.com/meilisearch/MeiliSearch/releases

I tested it with your example @fxi (cf your comment) and I don’t have the bug anymore!

Can you confirm it’s the same for you?

Can confirm such “stucks” when batch importing data to 5+ indexes (I’am using it with Laravel Scout). Adding a delay of 5-10 seconds between indexes processing helps 😃 And thanks for this awesome engine, I’am now migrating from Elastic, with Scout 9 native Meili support 😉