milvus: [Bug]: Suspected data loss during batch ingestion.

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: milvusdb/milvus-nightly
                  tag: nightly-20230205-ae305a5
- Deployment mode: cluster
- MQ type - pulsar    
- SDK version - Java sdk 2.2.0
- OS(Ubuntu or CentOS): Centos
- CPU/Memory: Based on sizing tools
- GPU: 
- Others: Collection with 3 fields - id, float vector of 384 dim, varchar field,
There are around 455 Partitions in the collection
Data cord - 1 core/2GB,
Data Node - 4 instances of 2 core/16GB,
Number of shards - 4,
proxy - 2 core/8GB

Current Behavior

We are doing a bulk ingestion of data to our milvus 2.2.x cluster, our data nodes, data cooord, query coord, proxy keeps crashing. And post the ingestion the entity count shown on attu is 7.8M , where as expected count was supposed to be 21.8M.

Expected Behavior

No response

Steps To Reproduce

1. create a collection
2. create 455 partitions to the collection
3. insert 1 ~ 5 rows into each partition randomly
4. datanode crashed occasionally, with 16GB mem. change to 20GB, no crash anymore
5. totally insert 24M rows, but num_entities only return 7M, even after called flush()

Milvus Log

milvus-log.tar.gz

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 21 (17 by maintainers)

Most upvoted comments

@MrPresent-Han, @xiaofan-luan Please find the etcd backup using birdwatcher bw_etcd_ALL.230212-223040.bak.gz