milvus: [Bug]: Searching and deleting vectors is not stable
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: 2.0.1
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.0.1
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: i5-6500 / 16Gb
Current Behavior
Hello! Faced a problem that was not in version 1.1.1.
Number of vectors: 10000 dimension: 512 Index: FLAT nlist: 2048 nprobe: 16
When it becomes necessary to update the vector, I delete the vector by its id and add a new id for this. A subsequent vector lookup doesn’t return the vector that was just added to me.
The same applies if you delete a vector and immediately search for the deleted vector. He will be found.
Immediately after the CRUD action, I call the num_entities property method, as I noticed that the flush method is used under the hood, but this does not always help either.
What am I doing wrong, please tell me.
Expected Behavior
After any CRUD operation, the search returns the correct results.
Steps To Reproduce
# create a collection
person_id = FieldSchema(name="person_id", dtype=DataType.INT64, is_primary=True)
person_vector = FieldSchema(name="person_vector", dtype=DataType.FLOAT_VECTOR, dim=512)
schema = CollectionSchema(fields=[person_id, person_vector], description="Persons Vectors")
Collection(name='test_collection', schema=schema, consistency_level="Strong")
#insert 1000 fictitious vectors
...
collection.num_entities
# create index
index_params = {"index_type": "FLAT", "metric_type": "IP", "params": {"nlist": 2048}}
collection.create_index(field_name="person_vector", index_params=index_params)
# upload a collection to search memory
collection.load()
vector_current = [...]
result = collection.search(data=vector_current, anns_field="person_vector", param=self.search_param, limit=1, expr=None)
# returns the faithful vector and its id
# delete vector by id
collection.delete(f"person_id in [{id_}]")
vector_new = [...]
# insert vector
collection.insert([[id], vector_new])
collection.num_entities
result = collection.search(data=vector_new, anns_field="person_vector", param=self.search_param, limit=1, expr=None)
# returns incorrect results
Anything else?
No response
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 28 (20 by maintainers)
If there is a duplicate pk in a segment, delete log will only match the first one, and the bitset generated during query is incorrect
https://github.com/milvus-io/milvus/blob/ca129d4308cc7221bb900b3722dea9b256e514f9/internal/core/src/segcore/ScalarIndex.cpp#L67