milvus: [Bug]: With TTL used,we got unusual disk occupation
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version:2.2.3
- Deployment mode(standalone or cluster):cluster
- MQ type(rocksmq, pulsar or kafka):
- SDK version(e.g. pymilvus v2.0.0rc2):
- OS(Ubuntu or CentOS):
- CPU/Memory:
- GPU:
- Others:
Current Behavior
With same profile ,I have start Milvus 2.2.3 for twice. First time I start cluster with TTL , then it comes a unusual disk occupation for 4 million vector occupy 100G disk space on each MinIO (we get four for it ). The second time without TTL ,50 milion vector only use 15G disk space for on MinIO. Are there something wrong with using TTL to delete data automatic? One more thing , we have used Milvus 2.2.8, 2.2.10,2.2.11 with TTL ,they all got unusual disk occupation ,30 miliion vector used almost 1 TB disk space. Maybe there’s really a problem with TTL?
Expected Behavior
No response
Steps To Reproduce
No response
Milvus Log
No response
Anything else?
No response
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 28 (22 by maintainers)
This is a bug.
Compacted segments didn’t update binlog’s timestampFrom, timestampTo field(default 0). Cause compaction_trigger always think compacted segment has too many expired data, needs a single compaction. So segments will be compacted endlessly until meet collection TTL. During this time, a lot of segment copys are generated.
Preparing fix PR
it occupy 100G after I hava inserted 4 milion vectors ,to reduce the pressure of milvus, i slowly inserted 4 milion vectors in 8 hours, then it happend. for now , i have inserted 100 million vectors in 48 hours ,and it only used 30G on each minio node
never do that
good catch. We will try to reproduce it in house. @binbinlv could you please help on this?
/assign @binbinlv /unassign