milvus: [Bug]: Range Search Limitied to 6400 Results
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: 2.3.2
- Deployment mode(standalone or cluster): cluster
- MQ type(rocksmq, pulsar or kafka): kafka
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus==2.3.2
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
When using range search with search param radius , Milvus only returns up to 6400 results even though topk/limit can be up to 16,384
Expected Behavior
I expect to get up to 16384 results, if those results are within the range_filter and radius
Steps To Reproduce
I have a collection with around 1 million records
If I perform a search with no range filter and limit=16384 , I get 16384 results
results = c.search(
data=[ref_embd],
#expr="candidate_task_id != ' '",
anns_field='img',
param={'params': {}, 'metric_type': 'L2'},
limit=16384,
offset=0,
output_fields=['id']
)[0]
ids = [result.id for result in results]
len(ids)
>> 16384
If I now add radius into params , while keeping limit=16384 , I only get 6400 results.
results = c.search(
data=[ref_embd],
#expr="candidate_task_id != ' '",
anns_field='img',
param={'params': {'radius': 10}, 'metric_type': 'L2'},
limit=16384,
offset=0,
output_fields=['id']
)[0]
ids = [result.id for result in results]
len(ids)
>> 6400
I have verified that this happens for ALL collections I have.
I have also verified that when I get 16384 records, the largest distance value is < 1.0 , so using radius of 10.0 should still get me back all 16384 results
Another note is search speed is orders of magnitude slower when using range search which I dont understand.
For example when I am using a smaller limit like 30, this same search without radius will run in around 120ms , and when I add radius to params it takes 7 seconds!!
### Milvus Log
_No response_
### Anything else?
_No response_
About this issue
- Original URL
- State: open
- Created 7 months ago
- Comments: 25 (11 by maintainers)
Hi @pakelley
@liliu-z yep, that’s correct. The example dataset in Hakan’s code (which has 17,760 records) was “fast” 2 days ago, and yesterday became “slow” and would only return 6400 records.
/assign @congqixia could you please also take a look?
/assign @jiaoew1991 /unassign
@hakan458 which index type are you running? could you please provide the full milvus logs? please refer this doc to export the whole Milvus logs for investigation. Also please attach the etcd backup which would help us understand the “slow issue”. Check this: https://github.com/milvus-io/birdwatcher for details about how to backup etcd with birdwatcher.
/assign @hakan458