meilisearch: Phrase match error in chinese
documents. default settings.
curl -s http://localhost:7700/indexes/products/documents | rq
[
{
"id": "123",
"title": "小化妆包"
},
{
"id": "456",
"title": "Ipad 包"
}
]
search with phrase match “化妆”. no result returned.
curl -s -X POST 'http://localhost:7700/indexes/products/search' --data '{ "q": "\"化妆\"" }' | rq
{
"exhaustiveNbHits": false,
"hits": [],
"limit": 20,
"nbHits": 0,
"offset": 0,
"processingTimeMs": 0,
"query": "\"化妆\""
}
search with phrase match “小化妆”. no result returned.
curl -s -X POST 'http://localhost:7700/indexes/products/search' --data '{ "q": "\"小化妆\"" }' | rq
{
"exhaustiveNbHits": false,
"hits": [],
"limit": 20,
"nbHits": 0,
"offset": 0,
"processingTimeMs": 0,
"query": "\"小化妆\""
}
search with phrase match “化妆包”. result returned.
curl -s -X POST 'http://localhost:7700/indexes/products/search' --data '{ "q": "\"化妆包\"" }' | rq
{
"exhaustiveNbHits": false,
"hits": [
{
"id": "123",
"title": "小化妆包"
}
],
"limit": 20,
"nbHits": 1,
"offset": 0,
"processingTimeMs": 0,
"query": "\"化妆包\""
}
search with phrase match “小化妆包”. result returned.
curl -s -X POST 'http://localhost:7700/indexes/products/search' --data '{ "q": "\"小化妆包\"" }' | rq
{
"exhaustiveNbHits": false,
"hits": [
{
"id": "123",
"title": "小化妆包"
}
],
"limit": 20,
"nbHits": 1,
"offset": 0,
"processingTimeMs": 0,
"query": "\"小化妆包\""
}
Expected behavior queries: “化妆”, “小化妆”, “化妆包”, “小化妆包” all of them should return same result.
{
"id": "123",
"title": "小化妆包"
}
MeiliSearch version:
meilisearch-http 0.22.0
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 19 (7 by maintainers)
Commits related to this issue
- Merge #372 372: Fix Meilisearch 1714 r=Kerollmops a=ManyTheFish The bug comes from the typo tolerance, to know how many typos are accepted we were counting bytes instead of characters in a word. On... — committed to meilisearch/milli by bors[bot] 3 years ago
- Merge #58 58: Test Meilisearch issue 1714 r=irevoire a=ManyTheFish Related to [Meilisearch#1714](https://github.com/meilisearch/MeiliSearch/issues/1714) no bug in Tokenizer Co-authored-by: many ... — committed to meilisearch/charabia by bors[bot] 3 years ago
- Fix #1714 test — committed to meilisearch/meilisearch by ManyTheFish 2 years ago
@ManyTheFish hey man, really appreciate your help. I’ll look into jieba and see what I can do for all of us
@ManyTheFish unfortunately, didn’t find a way to tweak jieba except for
load_dict
loading our own dictionary.https://github.com/messense/jieba-rs/issues/77
I will investigate, thank you @gemini133 for your report. 👍
yep, I think it’s related to the tokenizer too. why does “化妆” returns nothing then? not sure how to work around this. confusing…