milvus: [Bug]: Pod CPU and memory grows linearly.

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: 2.2.6
- Deployment mode(standalone or cluster): standalone 
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): 
- OS(Ubuntu or CentOS): Alibaba Cloud Linux 3 (Soaring Falcon)
- Kernel Version : 5.10.134-12.2.al8.x86_64
- CPU/Memory: 4c8g
- GPU: None

Current Behavior

I changed the milvus config and reset the data, still not working.

extraConfigFiles:
  user.yaml: |+
    rocksmq:
      # The path where the message is stored in rocksmq
      lrucacheratio: 0.06 # rocksdb cache memory ratio
      rocksmqPageSize: 16777216 # default is 256 MB, 256 * 1024 * 1024 bytes, The size of each page of messages in rocksmq
      retentionTimeInMinutes: 1440 # default is 5 days, 5 * 24 * 60 minutes, The retention time of the message in rocksmq.
      retentionSizeInMB: 1024 # default is 8 GB, 8 * 1024 MB, The retention size of the message in rocksmq.
      compactionInterval: 86400 # 1 day, trigger rocksdb compaction every day to remove deleted data
    rootCoord:
      # changing this value will make the cluster unavailable
      dmlChannelNum: 4
    dataCoord:
      segment:
        maxSize: 128 # Maximum size of a segment in MB
        diskSegmentMaxSize: 256 # Maximun size of a segment in MB for collection which has Disk index

image

image

image

Expected Behavior

No response

Steps To Reproduce

Just deploy the standalone server with Helm

Milvus Log

https://wormhole.app/onvd4#OJRkw87z5RA7pAWlu2VmbQ

Anything else?

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 24 (15 by maintainers)

Most upvoted comments

Ok, I know what you mean now, we indeed know how to find such a collection for pdf(by md5sum).

The conversation with you is very helpful! Thank you very much.

~ Sure, If you which pdf to lookup ,things becomes much easier! Simply put all the data into a mysql or pg and retrieve all the vectors from tranditional database. In Milvus we can also query with expression but if that’s the only operation you gonna to work with I would say tradition database should work better. Vector database is designed for ANN Search, find similarity among all embeddings

Why people want to use vector db is usually then don’t know what is most similar PDF is, then if you split data into multiple collections then you don’t know what collection to search.

If you take a look at how llama-index or langchain did you probably has more idea about what I’m saying. if you put pdf into different collection, then which collection you gonna to pick for search?

If you already know which pdf you want to search, then I guess pgvector or no matter what vector storage works for you. you can even retrieve all the embeddings from a traditional database and do brute force search in your memory. That shouldn’t be a big deal

if you are building on multi tenant solutions with 10k+ tenants, i thought logical partition is what you are looking for and we are actually working on it https://github.com/milvus-io/milvus/issues/23553

Thousands of collections might bring trouble. Like what?

did you want to syncup very quickly offline? Might be a little bit easier to explain. Each of collection has a message stream, and updating timetick every 100 ms, this will bring extra overhead for the whole systems if you have many collections. Also, I didn’t really recommend to create very small collections, Milvus only support build index on segment with num entities > 1024. So you’d better just use FLAT index if you just need to search on thousand entities.