milvus: [Bug]: With TTL used,we got unusual disk occupation

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version:2.2.3
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):    
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS): 
- CPU/Memory: 
- GPU: 
- Others:

Current Behavior

With same profile ,I have start Milvus 2.2.3 for twice. First time I start cluster with TTL , then it comes a unusual disk occupation for 4 million vector occupy 100G disk space on each MinIO (we get four for it ). The second time without TTL ,50 milion vector only use 15G disk space for on MinIO. Are there something wrong with using TTL to delete data automatic? One more thing , we have used Milvus 2.2.8, 2.2.10,2.2.11 with TTL ,they all got unusual disk occupation ,30 miliion vector used almost 1 TB disk space. Maybe there’s really a problem with TTL?

Expected Behavior

No response

Steps To Reproduce

No response

Milvus Log

No response

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 28 (22 by maintainers)

Most upvoted comments

This is a bug.

Compacted segments didn’t update binlog’s timestampFrom, timestampTo field(default 0). Cause compaction_trigger always think compacted segment has too many expired data, needs a single compaction. So segments will be compacted endlessly until meet collection TTL. During this time, a lot of segment copys are generated.

Preparing fix PR

with TTL ,after 8 hours we checked the minio disk, its rised highly

@QAQwaitme here which you mean is after you insert 4 million vectors, at the beginning it occupies normal size, and do nothing, then it rised highly to 100G util 8 hours later?

Or it just occupy 100G at the beginning when you inserted 4 million vectors?

it occupy 100G after I hava inserted 4 milion vectors ,to reduce the pressure of milvus, i slowly inserted 4 milion vectors in 8 hours, then it happend. for now , i have inserted 100 million vectors in 48 hours ,and it only used 30G on each minio node

never do that

good catch. We will try to reproduce it in house. @binbinlv could you please help on this?

/assign @binbinlv /unassign