meilisearch: Duplicate documents in the search route
The engine can return multiple versions of the same document (with the same document id). It should not be possible as a document with a specific document id must erase the previous version of itself. This bug appears in version v0.29.0 but not v0.28.1.
A strange coincidence is that we released the soft-deleted feature in this exact version. Here is a thread explaining the issue users have. We have private access to the settings one user was using when triggering this bug.
This bug could also show documents that had nothing in common with a user query. It was due to the fact that documents were wrongly deleted and, as ids are reused, associated to a completely different document.
EDIT by @curquiza
- Implement changes in Milli: https://github.com/meilisearch/milli/pull/723
- Release a Milli version containing these changes
- Bump this new Milli version in Meilisearch and merge it into
main
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 21 (12 by maintainers)
Commits related to this issue
- Merge #3036 3036: Bump milli to v0.35.1 r=irevoire a=Kerollmops This PR bumps milli to v0.35.1 which brings some fixes. You can see [the changelog of milli on the release page](https://github.com/me... — committed to meilisearch/meilisearch by bors[bot] 2 years ago
- Merge #3047 3047: Fix soft deleted bug settings r=ManyTheFish a=Kerollmops This PR fixes https://github.com/meilisearch/meilisearch/issues/3021 and fixes https://github.com/meilisearch/meilisearch/i... — committed to meilisearch/meilisearch by bors[bot] 2 years ago
- Merge #3047 3047: Fix soft deleted bug settings r=curquiza a=Kerollmops This PR fixes https://github.com/meilisearch/meilisearch/issues/3021 and fixes https://github.com/meilisearch/meilisearch/issu... — committed to meilisearch/meilisearch by bors[bot] 2 years ago
- Merge #3047 3047: Fix soft deleted bug settings r=curquiza a=Kerollmops This PR fixes https://github.com/meilisearch/meilisearch/issues/3021 and fixes https://github.com/meilisearch/meilisearch/issu... — committed to meilisearch/meilisearch by bors[bot] 2 years ago
- Merge #723 723: Fix bug in handling of soft deleted documents when updating settings r=Kerollmops a=loiclec # Pull Request ## Related issue Fixes (partially, until merged into meilisearch) https... — committed to meilisearch/milli by bors[bot] 2 years ago
- Merge #723 723: Fix bug in handling of soft deleted documents when updating settings r=Kerollmops a=loiclec # Pull Request ## Related issue Fixes (partially, until merged into meilisearch) https... — committed to meilisearch/milli by bors[bot] 2 years ago
- Merge #3202 #3203 3202: Bump milli to v0.37.1 r=curquiza a=Kerollmops This PR bumps milli to v0.37.1 and fixes #3167, #3178, #3165, and #3021. 3203: Update version for the next release (v0.30.1) in... — committed to meilisearch/meilisearch by bors[bot] 2 years ago
- Merge #750 750: Fix hard-deletion of an external id that was soft-deleted and then reimported - main r=irevoire a=loiclec # Pull Request ## Related issue Fixes (when merged into meilisearch) htt... — committed to meilisearch/milli by bors[bot] 2 years ago
- Merge #3267 #3268 3267: Bump milli to v0.37.5 r=curquiza a=curquiza Fixes #3021 3268: Make Clippy happy r=curquiza a=curquiza Fix clippy to be able to merge I made `cargo clippy --fix` Co-aut... — committed to meilisearch/meilisearch by bors[bot] 2 years ago
Hi there,
I’m running into kind of the same issue.
We noticed, that meili was returning wrong search documents while doing geo searches. (Vienna results popped up in london results). So we didn’t have duplicate entries in the search, but completely different results to what was requested via filters on meilisearch.
I was using v0.29.1 updated to v0.29.2 and recreated the index (only updating didn’t resolve the issue). The search documents are correct now, if we encounter any problems when doing document updates/additions/deletes and the search results seem odd, I’ll keep you posted.
Some details on the index and meili (if relevant):
Version: v0.29.1 (where the issue popped up initially) Special search functionality features in usage: distinct attribute, geosearch
This PR will be fixed by https://github.com/meilisearch/milli/pull/690.
My hope is that it’s the last time we reopen this issue 🙈
@thijndehaas Thank you for all the information 😄 We have reproduced and found a fix for the bug! (PR: https://github.com/meilisearch/milli/pull/750 ). My hope is that we can release meilisearch v0.30.5 with this fix within the next couple of days. We will keep you updated about the status of the release.
I’m seeing the same issue still with 0.29.2. I did the following when upgrading from 0.29.0 to 0.29.2
This happened a few days later (Upgraded 2022-11-23, issue occured again at 2022-11-30)
Currently running a test towards our test instance with mass update/delete of documents at random and duplicate ID:s to see if I can reproduce this.
Next test will be to delete the indexes and recreate them and see if the issue comes up again.
A minute ago my colleague found another websites where the problem is back. I will try to debug the corrupt documents there before I swap all data.
Update:
we are also experiencing this issue at freshline.io
we are seeing both duplicate search results with the same primary key and we’re seeing incorrect results being returned. in our case this has caused products from Customer A’s store to show up on Customer B’s store—which is obviously a big problem
we recently upgraded from 0.26 to 0.29.2. and we batch all of our meilisearch requests, so sometimes we are sending a large number of updates at once
Reopening as this issue is always present, even in the latest release, v0.30.0. The only way it seems to reproduce is by replacing many documents i.e. 9000 tasks. Note that the update or replace operation is internally removing and reindexing the documents. Therefore this issue can also be triggered with document deletion.
@thijndehaas sees duplicated documents with the same EAN, a typical issue we thought we fixed by closing this issue just before. You can see the discussion on this Slack thread.
It can seem like an isolated case but another user is experiencing the same issue. @0x15f doesn’t report duplicated documents (maybe because he uses only the filtering features on this index), but he reports the Missing key in the documents database error when sending documents deletion operations, a typical issue we thought we fixed by closing this issue just before. He is also experiencing An unexpected crash that occurred when processing the task but doesn’t have the logs, he will run again the v0.30.0 and try to provide the logs. We also know that he is running Meilisearch on an xfs file system.
This is the same user that is related to #3163, you can find a link to the corrupted data.ms file there.
Hey @qbx2,
There was a bug in our internal implementation of the soft-deletion system, this system must be invisible to the user, it is a system that speed-up the deletion and replacement of documents. A document must be replaced or updated when its document id matches one of the documents already in the database.