meilisearch: Phrase match error in chinese

documents. default settings.

curl -s http://localhost:7700/indexes/products/documents | rq
[
  {
    "id": "123",
    "title": "小化妆包"
  },
  {
    "id": "456",
    "title": "Ipad 包"
  }
]

search with phrase match “化妆”. no result returned.

curl -s -X POST 'http://localhost:7700/indexes/products/search' --data '{ "q": "\"化妆\"" }' | rq

{
  "exhaustiveNbHits": false,
  "hits": [],
  "limit": 20,
  "nbHits": 0,
  "offset": 0,
  "processingTimeMs": 0,
  "query": "\"化妆\""
}

search with phrase match “小化妆”. no result returned.

curl -s -X POST 'http://localhost:7700/indexes/products/search' --data '{ "q": "\"小化妆\"" }' | rq
{
  "exhaustiveNbHits": false,
  "hits": [],
  "limit": 20,
  "nbHits": 0,
  "offset": 0,
  "processingTimeMs": 0,
  "query": "\"小化妆\""
}

search with phrase match “化妆包”. result returned.

curl -s -X POST 'http://localhost:7700/indexes/products/search' --data '{ "q": "\"化妆包\"" }' | rq
{
  "exhaustiveNbHits": false,
  "hits": [
    {
      "id": "123",
      "title": "小化妆包"
    }
  ],
  "limit": 20,
  "nbHits": 1,
  "offset": 0,
  "processingTimeMs": 0,
  "query": "\"化妆包\""
}

search with phrase match “小化妆包”. result returned.

curl -s -X POST 'http://localhost:7700/indexes/products/search' --data '{ "q": "\"小化妆包\"" }' | rq
{
  "exhaustiveNbHits": false,
  "hits": [
    {
      "id": "123",
      "title": "小化妆包"
    }
  ],
  "limit": 20,
  "nbHits": 1,
  "offset": 0,
  "processingTimeMs": 0,
  "query": "\"小化妆包\""
}

Expected behavior queries: “化妆”, “小化妆”, “化妆包”, “小化妆包” all of them should return same result.

 {
      "id": "123",
      "title": "小化妆包"
    }

MeiliSearch version:

meilisearch-http 0.22.0

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 19 (7 by maintainers)

Commits related to this issue

Most upvoted comments

@ManyTheFish hey man, really appreciate your help. I’ll look into jieba and see what I can do for all of us

@ManyTheFish unfortunately, didn’t find a way to tweak jieba except for load_dict loading our own dictionary.

https://github.com/messense/jieba-rs/issues/77

I will investigate, thank you @gemini133 for your report. 👍

yep, I think it’s related to the tokenizer too. why does “化妆” returns nothing then? not sure how to work around this. confusing…