milvus: [Bug]: High CPU usage on idle

Is there an existing issue for this?

  • I have searched the existing issues

Environment

- Milvus version: v2.2.9
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus v2.2.9
- OS(Ubuntu or CentOS): Debian
- CPU/Memory: Intel Xeon E5-2686 v4 (8) @ 2.299GHz / 32GB
- GPU: N/A

Current Behavior

Hi Milvus team!

After I upgraded version of Milvus from v2.2.6 to v.2.2.9 I noticed significant performance issues. Search through collections became ~10-15 times slower. I noticed that almost in every moment Milvus heavily utilizing CPU (I have 8 core machine, and milvus run standalone takes 400%-500% of it) even there are no requests (no insertion, no search, etc). Sometimes it stops utilizing CPU that much and in these moments performance becomes as fast as it was previously, but this happens rarely (1 minute good performance for each 10 minutes bad performance because of loaded CPU).

I started digging and found a lot of these lines in Milvus logs:

milvus-standalone  | [2023/06/11 10:12:20.758 +00:00] [WARN] [server/rocksmq_impl.go:638] ["rocksmq produce too slowly"] [topic=by-dev-datacoord-timetick-channel] ["get lock elapse"=1190] ["alloc elapse"=0] ["write elapse"=0] ["updatePage elapse"=0] ["produce total elapse"=1190]
milvus-standalone  | [2023/06/11 10:12:20.775 +00:00] [WARN] [server/rocksmq_impl.go:638] ["rocksmq produce too slowly"] [topic=by-dev-datacoord-timetick-channel] ["get lock elapse"=1238] ["alloc elapse"=0] ["write elapse"=17] ["updatePage elapse"=0] ["produce total elapse"=1255]
milvus-standalone  | [2023/06/11 10:12:20.776 +00:00] [WARN] [server/rocksmq_impl.go:638] ["rocksmq produce too slowly"] [topic=by-dev-datacoord-timetick-channel] ["get lock elapse"=1255] ["alloc elapse"=0] ["write elapse"=1] ["updatePage elapse"=0] ["produce total elapse"=1256]
milvus-standalone  | [2023/06/11 10:12:20.776 +00:00] [WARN] [server/rocksmq_impl.go:638] ["rocksmq produce too slowly"] [topic=by-dev-datacoord-timetick-channel] ["get lock elapse"=1254] ["alloc elapse"=0] ["write elapse"=1] ["updatePage elapse"=0] ["produce total elapse"=1255]

So I’ve tried changing the MQ to Pulsar. I commented out rocksmq-related rows in milvus.yaml and added pulsar to docker-compose.yml:

  pulsar:
    container_name: milvus-pulsar
    image: apachepulsar/pulsar:2.8.2
    command: bin/pulsar standalone --no-functions-worker --no-stream-storage
    ports:
      - "6650:6650"
      - "8080:8080"

I had no success because even in Milvus logs everything was fine, when I tried to connect via python SDK it was told that Milvus is not ready yet.

Seeking your help! Maybe changing broker will help, but it is better to understand what is wrong with the default rocksmq.

p.s. I can’t rollback because when I try I get an error on Milvus startup:

milvus-standalone | [2023/06/10 17:21:09.906 +00:00] [INFO] [server/rocksmq_impl.go:154] ["Start rocksmq "] ["max proc"=8] [parallism=1] ["lru cache"=2020363714]
milvus-standalone | panic: Not implemented: Unsupported compression method for this build: ZSTD
milvus-standalone |
milvus-standalone | goroutine 1 [running]:
milvus-standalone | [github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run](http://github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run)(0xc0006efe58, 0x1, {0x0, 0x0})
milvus-standalone | /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:214 +0xd95
milvus-standalone | [github.com/milvus-io/milvus/cmd/milvus.(*run).execute](http://github.com/milvus-io/milvus/cmd/milvus.(*run).execute)(0xc0005bf9e0, {0xc000124060?, 0x3, 0x3}, 0xc00013e7e0)
milvus-standalone | /go/src/github.com/milvus-io/milvus/cmd/milvus/run.go:112 +0x66e
milvus-standalone | [github.com/milvus-io/milvus/cmd/milvus.RunMilvus](http://github.com/milvus-io/milvus/cmd/milvus.RunMilvus)({0xc000124060?, 0x3, 0x3})
milvus-standalone | /go/src/github.com/milvus-io/milvus/cmd/milvus/milvus.go:60 +0x21e
milvus-standalone | main.main()
milvus-standalone | /go/src/github.com/milvus-io/milvus/cmd/main.go:26 +0x2e

My milvus.yml is almost same to one provided by you milvus-io/milvus/v2.2.9/configs/milvus.yaml with only difference in authorizationEnabled option.

Here is my docker-compose.yml:

  etcd:
    container_name: milvus-etcd
    image: quay.io/coreos/etcd:v3.5.5
    environment:
      - ETCD_AUTO_COMPACTION_MODE=revision
      - ETCD_AUTO_COMPACTION_RETENTION=1000
      - ETCD_QUOTA_BACKEND_BYTES=4294967296
      - ETCD_SNAPSHOT_COUNT=50000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
      - /etc/milvus.yaml:/milvus/configs/milvus.yaml
    command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
    restart: always

  minio:
    container_name: milvus-minio
    image: minio/minio:RELEASE.2023-03-20T20-16-18Z
    environment:
      MINIO_ACCESS_KEY: minioadmin
      MINIO_SECRET_KEY: minioadmin
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
      - /etc/milvus.yaml:/milvus/configs/milvus.yaml
    command: minio server /minio_data
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
      interval: 30s
      timeout: 20s
      retries: 3
    restart: always

  standalone:
    container_name: milvus-standalone
    image: milvusdb/milvus:v2.2.9
    command: ["milvus", "run", "standalone"]
    environment:
      ETCD_ENDPOINTS: etcd:2379
      MINIO_ADDRESS: minio:9000
    volumes:
      - ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
      - /etc/milvus.yaml:/milvus/configs/milvus.yaml
    ports:
      - "19530:19530"
      - "9091:9091"
    depends_on:
      - "etcd"
      - "minio"
    logging:
      options:
        max-size: "100m"
    restart: always

  attu:
    container_name: milvus-attu
    image: zilliz/attu:latest
    depends_on:
      - "standalone"
    environment:
      MILVUS_URL: standalone:19530
    ports:
      - "8082:3000"
    restart: always

networks:
  default:
    name: milvus

Expected Behavior

I expect low CPU utilization on idle and fast responses.

Steps To Reproduce

No response

Milvus Log

log.txt

Anything else?

This one is similar https://github.com/milvus-io/milvus/issues/22571

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 53 (29 by maintainers)

Most upvoted comments

you can do so. this won’t change any data. Meanwhile you can wait for next milvus release and we will fix the issue