milvus: [Bug]: Size of `rdb_data` folder keeps increasing

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.1.1
- Deployment mode(standalone or cluster): standalone
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus 2.1.1
- OS(Ubuntu or CentOS): Ubuntu
- CPU/Memory: PVC for standalone pod is 50GB
- GPU: 
- Others:

Current Behavior

The size of the rdb_data folder keeps increasing, despite the following rocksmq configuration:

rocksmq:
    retentionTimeInMinutes: 60 ## 1 hour
    retentionSizeInMB: 500
    rocksmqPageSize: "2147483648" ## 2 GB
    lrucacheratio: 0.06 ## rocksdb cache memory ratio

It ultimately hits 50GB, the size of the PVC for the standalone pod, and causes the standalone pod go into CrashLoopBackoff.

Expected Behavior

Given the rocksmq configuration above, I would expect rdb_data folder to be cleared every hour.

Steps To Reproduce

Deploy Milvus standalone to K8s via Helm chart, with following values.yaml:

https://gist.github.com/devennavani/3629603122333e8a245a1f564d691838

Milvus Log

No response

Anything else?

milvus PV usage:

Screen Shot 2022-09-29 at 12 52 49 PM

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 1
  • Comments: 34 (20 by maintainers)

Most upvoted comments

@LoveEachDay Thanks for the response! I’m looking forward to the fix 😃

Is this only an issue with using rocksmq, or is it also an issue with using Pulsar and Kafka (in Milvus standalone)?

This is only an issue with rocksmq (in milvus standalone).

Hate to be a pest, but @aoiasd could you confirm this issue has been correctly assigned, and if so what is its priority? The current workaround is fairly disruptive in production to the point we’re considering Kafka, and I’d really hate to have to bring zookeeper into our stack 😆

sry, we had a week’s holiday because of the festival, and now it is over. I will work for this immediately.

I am using milvusdb/milvus:v2.2.4 yhmo in the Slack group helped me out. I deleted everything in the rdb data after flushing the collections, but its filling back up. Usage of /: 35.8% of 74.79GB I will update docker compose with the settings below and see how it goes

rocksmq:
    retentionTimeInMinutes: 60 ## 1 hour
    retentionSizeInMB: 256
    rocksmqPageSize: "2147483648" ## 2 GB
    lrucacheratio: 0.06 ## rocksdb cache memory ratio
    

Same question as above, rocksmqPageSize set as 2G, means that every channel will keep 2G data, and we could have 256 channel at most by default(min(256, shard_num*collection_num)), means that may keep 512G by default, so you could set rocksmqPageSize to 256M or smaller for less disk space use.

We have set this param default to 256M after V2.2.5, and we have find that page size was hard to connect to the actual disk use, we will try to find a better param to replace it in a future edition.

Actually change default channel number to 16 on 2.2.8 may also helps