milvus: [Bug]: Searching and deleting vectors is not stable

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.0.1
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.0.1
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: i5-6500 / 16Gb

Current Behavior

Hello! Faced a problem that was not in version 1.1.1.

Number of vectors: 10000 dimension: 512 Index: FLAT nlist: 2048 nprobe: 16

When it becomes necessary to update the vector, I delete the vector by its id and add a new id for this. A subsequent vector lookup doesn’t return the vector that was just added to me.

The same applies if you delete a vector and immediately search for the deleted vector. He will be found.

Immediately after the CRUD action, I call the num_entities property method, as I noticed that the flush method is used under the hood, but this does not always help either.

What am I doing wrong, please tell me.

Expected Behavior

After any CRUD operation, the search returns the correct results.

Steps To Reproduce

# create a collection
person_id = FieldSchema(name="person_id", dtype=DataType.INT64, is_primary=True)
person_vector = FieldSchema(name="person_vector", dtype=DataType.FLOAT_VECTOR, dim=512)
schema = CollectionSchema(fields=[person_id, person_vector], description="Persons Vectors")

Collection(name='test_collection', schema=schema, consistency_level="Strong")

#insert 1000 fictitious vectors
...
collection.num_entities

# create index
index_params = {"index_type": "FLAT", "metric_type": "IP", "params": {"nlist": 2048}}
collection.create_index(field_name="person_vector", index_params=index_params)

# upload a collection to search memory
collection.load()

vector_current = [...]
result = collection.search(data=vector_current, anns_field="person_vector", param=self.search_param, limit=1, expr=None)
# returns the faithful vector and its id

# delete vector by id
collection.delete(f"person_id in [{id_}]")

vector_new = [...]

# insert vector
collection.insert([[id], vector_new])
collection.num_entities

result = collection.search(data=vector_new, anns_field="person_vector", param=self.search_param, limit=1, expr=None)
# returns incorrect results

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 28 (20 by maintainers)

Most upvoted comments

/assign @xige-16 could you help on investigating the issue?

If there is a duplicate pk in a segment, delete log will only match the first one, and the bitset generated during query is incorrect

https://github.com/milvus-io/milvus/blob/ca129d4308cc7221bb900b3722dea9b256e514f9/internal/core/src/segcore/ScalarIndex.cpp#L67