milvus: Wrong results returned for merged different 'topk' queries
Please state your issue using the following template and, most importantly, in English.
Describe the bug
I have a python celery worker running inside a docker container that spins off 4 processes, each of which search milvus, fairly rapidly, using pymilvus status, results = client.search(collection_name, top_k=2, query_records=[film_A]). With just one docker container running the search results come back as expected. When I spin up a second docker container with a worker containing 4 more processes I start getting search results that return with Status.code == 0 but with the results list is empty. When I do the same query in an interactive python shell or the milvus-em gui I can confirm the vector_id does exist in the database.
Steps/Code to reproduce behavior Follow this guide to craft a minimal bug report. This helps us reproduce the issue you’re having and resolve the issue more quickly.
Expected behavior All searches return topk number of ids if vector in database.
Method of installation
- Docker/cpu
- Docker/gpu
- Build from source
Environment details
-
Hardware/Softwars conditions (OS, CPU, GPU, Memory) nvidia/cuda:10.1-cudnn7-devel-ubuntu18.04
-
Milvus version (master or released version) 1.0.0-cpu-d030521-1ea92e
Configuration file
Settings you made in server_config.yaml or milvus.yaml
paste-file-content-here
Additional context Not sure if this is a milvusdb issue or pymilvus issue. Pymilvus issue here.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (9 by maintainers)
Commits related to this issue
- fix #4797 Signed-off-by: yhmo <yihua.mo@zilliz.com> — committed to yhmo/milvus by yhmo 3 years ago
- fix #4797 Signed-off-by: yhmo <yihua.mo@zilliz.com> — committed to yhmo/milvus by yhmo 3 years ago
- fix wrong results of merged different 'topk' queries (#4924) * fix #4797 Signed-off-by: yhmo <yihua.mo@zilliz.com> Co-authored-by: shengjun.li <shengjun.li@zilliz.com> — committed to milvus-io/milvus by yhmo 3 years ago
The pr#4924 will fix this issue. The next version v1.1 will contain this fix.
We have found the root cause: When milvus get frequently/intense search requests, it will combine several requests into one request, execute the request and then split the result into several pieces, send piece to each client. In common case, user use same topk for search, the search combine machinery works well for same topk. But in this case, the topk can be changed by the code:
Milvus combines requests with different topk, make some mistake during splitting the result into server pieces, and return wrong result to client.
With the pickle dump file, I can reproduce this issue. Looks like the issue is related with milvus search combine function. Add a config in server_config.yaml can avoid it:
But the root cause still need to be investigated.