milvus: [Bug]: Search iterator results are sometimes missing
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: master-20230620-247f1170
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): 2.4.0.dev75
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
Search iterator results are sometimes missing. I don’t set radius and range_filter. Is it related to default radius?
- search with expression:
> assert len(set(pk_list)) == 1000
E assert 993 == 1000
E +993
E -1000
- search without expression:
> assert len(set(pk_list)) == default_nb
E assert 2999 == 3000
E +2999
E -3000
Expected Behavior
return all results
Steps To Reproduce
1. create a collection
2. insert 3000 data
3. create index L2, load
*4. use expression to chose 1000 data
5. search iterator
6. count the total results
Milvus Log
@pytest.mark.tags(CaseLabel.L1)
@pytest.mark.parametrize("metrics", ct.float_metrics[:1])
def test_search_iterator_with_expression(self, metrics):
"""
target: test search iterator normal
method: 1. search iterator
2. check the result, expect pk not repeat and meet the expr requirements
expected: search successfully
"""
# 1. initialize with data
limit = 100
dim = 128
collection_w = self.init_collection_general(prefix, True, dim=dim, is_index=False)[0]
collection_w.create_index(field_name, {"metric_type": metrics})
collection_w.load()
# 2. search iterator
search_params = {"metric_type": metrics}
expression = "1000.0 <= float < 2000.0"
search_iterator = collection_w.search_iterator(vectors[:1], field_name, search_params,
limit, output_fields=['float'], expr=expression)[0]
# 3. check the result
page_idx = 0
pk_list = []
while True:
res = search_iterator.next()
if len(res[0]) == 0:
log.info("search iteration finished, close")
search_iterator.close()
break
for i in range(len(res[0])):
# log.info(res[0][i])
pk_list.append(res[0][i].id)
page_idx += 1
log.info(len(pk_list))
assert len(set(pk_list)) == 1000
Anything else?
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (15 by maintainers)
I suppose the difference between two segments code is that the code from @NicoYuan1986 searching the growing segment. So the final result is generated by violent calculation without using HNSW. While code inside e2e test used HNSW as expected. I got the following log inside querynode.log
[2023/06/25 16:42:13.982 +08:00] [INFO] [segments/search.go:94] [“search growing/sealed segments without indexes”] [traceID=9340d5430619b939f618f915e907779f] [segmentIDs=“[442410230948028121]”]
Hello @NicoYuan1986 , i test in your pr https://github.com/milvus-io/milvus/pull/25039 and pass ,can you test my pr again 😃 https://github.com/milvus-io/pymilvus/pull/1551
It’s true that hnsw can overlook minor set of the result. But the same code I mentioned can get 1000 items every time, whereas the code pytest will get 993~997 items every time. Although we can enlarge the dataset and verify that range search may leave some items, we still cannot explain this phenomenon.
NicoYuan1986 has tested various variables and the only left possibility is the DataFrame used by pytest
Wow! good job