milvus: [Bug]: The result of search iterator is not equal with inserted entity number with FLAT index and (COSINE or L2) metric type
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: master-20230921-206cc14d
- Deployment mode(standalone or cluster):both
- MQ type(rocksmq, pulsar or kafka): all
- SDK version(e.g. pymilvus v2.0.0rc2): 2.3.0.post1.dev13
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
The result of search iterator is not equal with inserted entity number with FLAT index and COSINE metric type
Expected Behavior
The result of search iterator is equal with inserted entity number with FLAT index and COSINE metric type
Steps To Reproduce
from pymilvus import CollectionSchema, FieldSchema
from pymilvus import Collection
from pymilvus import connections
from pymilvus import DataType
from pymilvus import Partition
from pymilvus import utility
import time
import numpy as np
import random
import pandas as pd
import pyarrow.parquet as pq
import pyarrow as pa
connections.connect(host="***", port="19530")
dim = 768
int64_field = FieldSchema(name="int64", dtype=DataType.INT64, is_primary=True)
float_field = FieldSchema(name="float", dtype=DataType.FLOAT)
bool_field = FieldSchema(name="bool", dtype=DataType.BOOL)
float_vector = FieldSchema(name="float_vector", dtype=DataType.FLOAT_VECTOR, dim=dim)
schema = CollectionSchema(fields=[int64_field, float_field, bool_field, float_vector])
collection = Collection("test_search_iterator_6", schema=schema)
import random
nb = 1000000
vectors = [[random.random() for _ in range(dim)] for _ in range(nb)]
insert_batch_num = 10000
for i in range(int(nb/insert_batch_num)):
batch_vectors = vectors[i*insert_batch_num:(i+1)*insert_batch_num]
print("get %d vectors" % len(batch_vectors))
res = collection.insert([[i for i in range(i*insert_batch_num, (i+1)*insert_batch_num)], [np.float32(i) for i in range(i*insert_batch_num, (i+1)*insert_batch_num)], [np.bool_(i) for i in range(i*insert_batch_num, (i+1)*insert_batch_num)], batch_vectors])
print("inserted %d %d " % (i, insert_batch_num))
index_param = {"index_type": "FLAT", "metric_type": "COSINE"}
collection.create_index("float_vector", index_param, index_name="index_name")
collection.load()
default_search_params = {"metric_type": "COSINE"}
collection.flush()
time.sleep(30)
collection.num_entities
batch_size = 10000
search_iterator = collection.search_iterator(vectors[:1], "float_vector", default_search_params, batch_size=batch_size)
page_idx = 0
distance_struct_array = []
while True:
res = search_iterator.next()
if len(res) == 0:
print("search iteration finished, close")
search_iterator.close()
break
print(len(res))
page_idx += 1
print(f"page{page_idx}-------------------------")
for i in range(len(res)):
distance_struct_array.append({'id': res[i].id, 'distance': res[i].distance})
print(len(distance_struct_array))
print(distance_struct_array[:100])
assert len(distance_struct_array)==nb
collection.drop()
Milvus Log
No response
Anything else?
No response
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 29 (29 by maintainers)
Commits related to this issue
- fix precision for segcore reduce(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 8 months ago
- fix precision for segcore reduce(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 8 months ago
- fix precision for segcore reduce(#27325) (#28062) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to milvus-io/milvus by MrPresent-Han 8 months ago
- fix: fix precision for search reduce(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 7 months ago
- fix: fix reduce precision for search(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 7 months ago
- fix: fix reduce precision for search(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 7 months ago
- fix: fix precision for search reduce(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 7 months ago
- fix: fix precision for search reduce(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 7 months ago
- fix: fix reduce precision for search(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 7 months ago
- fix: fix reduce precision for search(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 7 months ago
- fix: fix precision for search reduce(#27325) Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to MrPresent-Han/milvus by MrPresent-Han 7 months ago
- fix: fix reduce precision for search(#27325) (#29031) related: #27325 Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to milvus-io/milvus by MrPresent-Han 7 months ago
- fix: fix precision for search reduce(#27325) (#29032) related: #27325 pr: https://github.com/milvus-io/milvus/pull/29031 Signed-off-by: MrPresent-Han <chun.han@zilliz.com> — committed to milvus-io/milvus by MrPresent-Han 7 months ago
I found that the result got by sdk is not strictly sorted by distance, there may be some flaws on reduce logic.