meilisearch: Migrating 40M records from Postgres to Meilisearch takes too long
I am trying to migrate 40M Postgres records (user comments) to Melisearch. Each time, I am querying 10K records from Postgres and writing to Meilisearch. The migration script has been completed. But it has been more than 3 days, only 4.2 Million records only migrated. The count is increasing very slowly.
I also see the following error messages in the error log:
[2021-02-18T12:48:44Z INFO ureq::unit] sending request POST https://api.amplitude.com/httpapi
[2021-02-18T13:14:58Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T13:43:50Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T13:48:45Z INFO ureq::unit] sending request POST https://api.amplitude.com/httpapi
[2021-02-18T14:13:03Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T14:42:05Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T14:48:45Z INFO ureq::unit] sending request POST https://api.amplitude.com/httpapi
[2021-02-18T15:12:06Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T15:41:40Z ERROR meilisearch_core::database] commit nested transaction failed: Input/output error (os error 5)
[2021-02-18T15:48:46Z INFO ureq::unit] sending request POST https://api.amplitude.com/httpapi
Also, I tried to get updates using the following curl
curl 'http://localhost:7700/indexes/comments/updates' | jq
It is running forever, not getting any result from the server.
- What is commit nested transaction failed and How to fix it?
- Is there any best practice to write large amounts data to meilisearch?
- Is there any limitation in the amount of data to be handled by meilisearch?
- Is there an option to disable amplitude data?
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 26 (12 by maintainers)
If I remember correctly the biggest dataset we are currently using to test the new engine is ~120M documents.
I retried with 1k documents batches. It ingest 25k doc in few seconds ๐ 259MB raw data, 1.8GB index size.
Hello @setop, @tmikaeld, @aaqibjavith and everyone following the issue!
The first RC of MeiliSearch v0.21.0 is out. We did our best to fix indexation and crash issues. We succeeded to improve it, but not to totally fix them.
You can test this new release by downloading the binaries available in this release. Or you can use it with docker:
We will still improve this after the release of the v0.21.0. We would rather release a non-completely optimized version rather than delay it and, at the same time, delay the release of new features. Be sure we are doing our best to always improve these indexation issues.
As a reminder:
Thanks for your patience and your help with this! โค๏ธ
Hello @quangtam! Thank you for your feedback ๐
Awesome @setop, Iโm going to open an issue in the docs, pushing by batch is not obvious for everyone. To be honest thatโs a tip I give multiple times a week, youโre not the only one ๐ Glad you succeeded to index your data! Edit: I added a comment to this already existing issue: https://github.com/meilisearch/documentation/issues/875#issuecomment-831115956
Hello @aaqibjavith!
Thanks for your feedback and for trying MeiliSearch.
The current search engine in MeiliSearch is currently not able to handle this high number of documents. This is a known issue and the core-team is currently working hard on a new search engine that might be able to handle this quantity of documents! ๐
For the moment, we recommend pushing your documents in larger batches, donโt forget to increase the Payload Limit Size.
Be sure we will keep you informed when the new search engine is available ๐