meilisearch: Migrating 40M records from Postgres to Meilisearch takes too long

I am trying to migrate 40M Postgres records (user comments) to Melisearch. Each time, I am querying 10K records from Postgres and writing to Meilisearch. The migration script has been completed. But it has been more than 3 days, only 4.2 Million records only migrated. The count is increasing very slowly.

I also see the following error messages in the error log:

[2021-02-18T12:48:44Z INFO  ureq::unit] sending request POST https://api.amplitude.com/httpapi
[2021-02-18T13:14:58Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T13:43:50Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T13:48:45Z INFO  ureq::unit] sending request POST https://api.amplitude.com/httpapi
[2021-02-18T14:13:03Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T14:42:05Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T14:48:45Z INFO  ureq::unit] sending request POST https://api.amplitude.com/httpapi
[2021-02-18T15:12:06Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T15:41:40Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T15:48:46Z INFO  ureq::unit] sending request POST https://api.amplitude.com/httpapi

Also, I tried to get updates using the following curl

curl 'http://localhost:7700/indexes/comments/updates' | jq

It is running forever, not getting any result from the server.

  1. What is commit nested transaction failed and How to fix it?
  2. Is there any best practice to write large amounts data to meilisearch?
  3. Is there any limitation in the amount of data to be handled by meilisearch?
  4. Is there an option to disable amplitude data?

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 26 (12 by maintainers)

Most upvoted comments

If I remember correctly the biggest dataset we are currently using to test the new engine is ~120M documents.

I retried with 1k documents batches. It ingest 25k doc in few seconds ๐Ÿš€ 259MB raw data, 1.8GB index size.

Hello @setop, @tmikaeld, @aaqibjavith and everyone following the issue!

The first RC of MeiliSearch v0.21.0 is out. We did our best to fix indexation and crash issues. We succeeded to improve it, but not to totally fix them.

You can test this new release by downloading the binaries available in this release. Or you can use it with docker:

docker run -p 7700:7700 getmeili/meilisearch:v0.21.0rc1 ./meilisearch

We will still improve this after the release of the v0.21.0. We would rather release a non-completely optimized version rather than delay it and, at the same time, delay the release of new features. Be sure we are doing our best to always improve these indexation issues.

As a reminder:

  • despite the improvements, we always recommend pushing your documents to MeiliSearch by batch and not one by one. The maximum payload size of MeiliSearch is 100Mb and can be optionally increased. It means most of the dataset can be pushed with one call to the API.
  • If you still have a memory crash, it means the RAM of your machine might not be adapted to your dataset size. We recommend increasing the RAM of your machine.

Thanks for your patience and your help with this! โค๏ธ

Hello @quangtam! Thank you for your feedback ๐Ÿ˜„

  • You can test our new version of MeiliSearch that we will release on Monday (v0.22.0). This new version contains a new fresh indexer. At the moment you can test the release candidate versions (v0.22.0rc1)
  • I cannot tell you what is the exact payload size we recommend for your dataset since I donโ€™t know it, but I really recommend increase it, for example, 500Mb instead of the 100Mb default value. The bigger is the payload, the faster will be the indexation. But the bigger is the payload, the more you need RAM, and you risk being killed by the OS because of too much RAM consumption. This kill problem is a known problem and v0.22.0 should fix it partly ๐Ÿ™‚

Awesome @setop, Iโ€™m going to open an issue in the docs, pushing by batch is not obvious for everyone. To be honest thatโ€™s a tip I give multiple times a week, youโ€™re not the only one ๐Ÿ™‚ Glad you succeeded to index your data! Edit: I added a comment to this already existing issue: https://github.com/meilisearch/documentation/issues/875#issuecomment-831115956

Hello @aaqibjavith!

Thanks for your feedback and for trying MeiliSearch.

The current search engine in MeiliSearch is currently not able to handle this high number of documents. This is a known issue and the core-team is currently working hard on a new search engine that might be able to handle this quantity of documents! ๐Ÿš€

For the moment, we recommend pushing your documents in larger batches, donโ€™t forget to increase the Payload Limit Size.

Be sure we will keep you informed when the new search engine is available ๐Ÿ˜