meilisearch: Problem indexing various indices
Hi
Describe the bug I tried to index 55 index from customers where I work, the 55 most searched index from one of our ES clusters, they have various schemas, length of index, size of documents, etc… The problem is that meilisearch get stucks, it stop responding and also the web gui, seems like a bug somewhere.
To Reproduce I used the next docker-compose for meilisearch
version: "3.9"
services:
es56:
image: getmeili/meilisearch
ports:
- 7700:7700
command: ['./meilisearch', '--http-addr', '0.0.0.0:7700', '--http-payload-size-limit', '1000000000', '--no-sentry', '1']
# command: ['./meilisearch', '--env', 'production', '--http-addr', '0.0.0.0:7700', '--http-payload-size-limit', '1000000000', '--no-sentry', '1', '--master-key', '12341234']
networks:
default:
external:
name: apps
Python script used to index files
import logging
import json
import gzip
import os
from argparse import ArgumentParser
from contextlib import contextmanager
import meilisearch
@contextmanager
def open_file(filename, mode):
if filename.endswith('.gz'):
with gzip.GzipFile(filename, mode) as _file:
yield _file
else:
with open(filename, mode) as _file:
yield _file
def import_to_meilisearch(filesource: str, meili_server: str, indexname: str):
client = meilisearch.Client(meili_server, '12341234')
# An index is where the documents are stored.
index = client.index(indexname)
with open_file(filesource, 'rb') as _file:
items_list = json.load(_file)
count = 0
chunk_size = 100
chunk = []
for item in items_list:
count += 1
chunk.append(item)
if count % chunk_size == 0:
logging.info(
f'Processed {count} items of {indexname}, {len(chunk)}')
# to make sure al prices are float, if needed.
# for key in item.keys():
# if key.endswith('price'):
# try:
# item[key] = float(item[key])
# print(item[key])
# except Exception:
# logging.exception('error parsing')
# item[key] = 0.0
# print(item[key])
if len(chunk) == chunk_size:
index.add_documents(chunk)
chunk = []
if len(chunk) > 0:
index.add_documents(chunk)
logging.info(f'inserted {count} index')
def get_args():
parser = ArgumentParser()
parser.add_argument(
'filesfolder', type=str, help='feed or path to import.')
parser.add_argument(
'meili-server', type=str, help='MeiliSearch server target')
return parser.parse_args()
def start(args: ArgumentParser):
if args.folder:
filenames = os.listdir(args.filesfolder)
for filename in filenames:
_prefix, indexname = filename.split('_')
# all files are in gzip...
indexname = indexname.strip('.json.gz')
filename = f'{args.file.strip("/")}/{filename}'
import_to_meilisearch(filename, args.meili_server, indexname)
def main():
logging.basicConfig(level=logging.INFO)
args = get_args()
start(args)
main()
Of course I can’t give the files, but maybe getting various public datasets it can be reproduced. json file structures: [{"foo": "bar", ...} {"bar": "baz", ...}]
Expected behavior To index all data ok
Desktop (please complete the following information):
- OS: Ubuntu 18, Docker version 20.10.2, build 2291f61
- docker meili latest image (0.18)
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (6 by maintainers)
Should I start to learn how to debug a Rust app myself ? haha
This issue is also reproducible easily with the macOS version (v0.20.0) installed with
brew install meilisearch
brew install meilisearch
meilisearch
->> http://127.0.0.1:7700http://127.0.0.1:7700
( key field can be empty if no key is set )Same thing happens with
Thanks 😃
Here is a picture my cat, which should bring you joy :
Using v0.21.0rc1 we’re no longer able to reproduce the issue from v0.20 👍
Thanks!
Hello !
Thanks 😃
It seems to work well using the docker image and the test script
Hello @montaniasystemab @volkv @fxi @sonic182 and everyone following the issue!
The RC of the v0.21.0 is out! https://github.com/meilisearch/MeiliSearch/releases
I tested it with your example @fxi (cf your comment) and I don’t have the bug anymore!
Can you confirm it’s the same for you?
Can confirm such “stucks” when batch importing data to 5+ indexes (I’am using it with Laravel Scout). Adding a delay of 5-10 seconds between indexes processing helps 😃 And thanks for this awesome engine, I’am now migrating from Elastic, with Scout 9 native Meili support 😉