milvus: [Bug]: High CPU usage on idle
Is there an existing issue for this?
- I have searched the existing issues
Environment
- Milvus version: v2.2.9
- Deployment mode(standalone or cluster): standalone
- MQ type(rocksmq, pulsar or kafka): rocksmq
- SDK version(e.g. pymilvus v2.0.0rc2): pymilvus v2.2.9
- OS(Ubuntu or CentOS): Debian
- CPU/Memory: Intel Xeon E5-2686 v4 (8) @ 2.299GHz / 32GB
- GPU: N/A
Current Behavior
Hi Milvus team!
After I upgraded version of Milvus from v2.2.6
to v.2.2.9
I noticed significant performance issues. Search through collections became ~10-15 times slower. I noticed that almost in every moment Milvus heavily utilizing CPU (I have 8 core machine, and milvus run standalone
takes 400%-500% of it) even there are no requests (no insertion, no search, etc). Sometimes it stops utilizing CPU that much and in these moments performance becomes as fast as it was previously, but this happens rarely (1 minute good performance for each 10 minutes bad performance because of loaded CPU).
I started digging and found a lot of these lines in Milvus logs:
milvus-standalone | [2023/06/11 10:12:20.758 +00:00] [WARN] [server/rocksmq_impl.go:638] ["rocksmq produce too slowly"] [topic=by-dev-datacoord-timetick-channel] ["get lock elapse"=1190] ["alloc elapse"=0] ["write elapse"=0] ["updatePage elapse"=0] ["produce total elapse"=1190]
milvus-standalone | [2023/06/11 10:12:20.775 +00:00] [WARN] [server/rocksmq_impl.go:638] ["rocksmq produce too slowly"] [topic=by-dev-datacoord-timetick-channel] ["get lock elapse"=1238] ["alloc elapse"=0] ["write elapse"=17] ["updatePage elapse"=0] ["produce total elapse"=1255]
milvus-standalone | [2023/06/11 10:12:20.776 +00:00] [WARN] [server/rocksmq_impl.go:638] ["rocksmq produce too slowly"] [topic=by-dev-datacoord-timetick-channel] ["get lock elapse"=1255] ["alloc elapse"=0] ["write elapse"=1] ["updatePage elapse"=0] ["produce total elapse"=1256]
milvus-standalone | [2023/06/11 10:12:20.776 +00:00] [WARN] [server/rocksmq_impl.go:638] ["rocksmq produce too slowly"] [topic=by-dev-datacoord-timetick-channel] ["get lock elapse"=1254] ["alloc elapse"=0] ["write elapse"=1] ["updatePage elapse"=0] ["produce total elapse"=1255]
So I’ve tried changing the MQ to Pulsar. I commented out rocksmq-related rows in milvus.yaml
and added pulsar to docker-compose.yml
:
pulsar:
container_name: milvus-pulsar
image: apachepulsar/pulsar:2.8.2
command: bin/pulsar standalone --no-functions-worker --no-stream-storage
ports:
- "6650:6650"
- "8080:8080"
I had no success because even in Milvus logs everything was fine, when I tried to connect via python SDK it was told that Milvus is not ready yet.
Seeking your help! Maybe changing broker will help, but it is better to understand what is wrong with the default rocksmq.
p.s. I can’t rollback because when I try I get an error on Milvus startup:
milvus-standalone | [2023/06/10 17:21:09.906 +00:00] [INFO] [server/rocksmq_impl.go:154] ["Start rocksmq "] ["max proc"=8] [parallism=1] ["lru cache"=2020363714]
milvus-standalone | panic: Not implemented: Unsupported compression method for this build: ZSTD
milvus-standalone |
milvus-standalone | goroutine 1 [running]:
milvus-standalone | [github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run](http://github.com/milvus-io/milvus/cmd/roles.(*MilvusRoles).Run)(0xc0006efe58, 0x1, {0x0, 0x0})
milvus-standalone | /go/src/github.com/milvus-io/milvus/cmd/roles/roles.go:214 +0xd95
milvus-standalone | [github.com/milvus-io/milvus/cmd/milvus.(*run).execute](http://github.com/milvus-io/milvus/cmd/milvus.(*run).execute)(0xc0005bf9e0, {0xc000124060?, 0x3, 0x3}, 0xc00013e7e0)
milvus-standalone | /go/src/github.com/milvus-io/milvus/cmd/milvus/run.go:112 +0x66e
milvus-standalone | [github.com/milvus-io/milvus/cmd/milvus.RunMilvus](http://github.com/milvus-io/milvus/cmd/milvus.RunMilvus)({0xc000124060?, 0x3, 0x3})
milvus-standalone | /go/src/github.com/milvus-io/milvus/cmd/milvus/milvus.go:60 +0x21e
milvus-standalone | main.main()
milvus-standalone | /go/src/github.com/milvus-io/milvus/cmd/main.go:26 +0x2e
My milvus.yml
is almost same to one provided by you milvus-io/milvus/v2.2.9/configs/milvus.yaml with only difference in authorizationEnabled
option.
Here is my docker-compose.yml
:
etcd:
container_name: milvus-etcd
image: quay.io/coreos/etcd:v3.5.5
environment:
- ETCD_AUTO_COMPACTION_MODE=revision
- ETCD_AUTO_COMPACTION_RETENTION=1000
- ETCD_QUOTA_BACKEND_BYTES=4294967296
- ETCD_SNAPSHOT_COUNT=50000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/etcd:/etcd
- /etc/milvus.yaml:/milvus/configs/milvus.yaml
command: etcd -advertise-client-urls=http://127.0.0.1:2379 -listen-client-urls http://0.0.0.0:2379 --data-dir /etcd
restart: always
minio:
container_name: milvus-minio
image: minio/minio:RELEASE.2023-03-20T20-16-18Z
environment:
MINIO_ACCESS_KEY: minioadmin
MINIO_SECRET_KEY: minioadmin
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/minio:/minio_data
- /etc/milvus.yaml:/milvus/configs/milvus.yaml
command: minio server /minio_data
healthcheck:
test: ["CMD", "curl", "-f", "http://localhost:9000/minio/health/live"]
interval: 30s
timeout: 20s
retries: 3
restart: always
standalone:
container_name: milvus-standalone
image: milvusdb/milvus:v2.2.9
command: ["milvus", "run", "standalone"]
environment:
ETCD_ENDPOINTS: etcd:2379
MINIO_ADDRESS: minio:9000
volumes:
- ${DOCKER_VOLUME_DIRECTORY:-.}/volumes/milvus:/var/lib/milvus
- /etc/milvus.yaml:/milvus/configs/milvus.yaml
ports:
- "19530:19530"
- "9091:9091"
depends_on:
- "etcd"
- "minio"
logging:
options:
max-size: "100m"
restart: always
attu:
container_name: milvus-attu
image: zilliz/attu:latest
depends_on:
- "standalone"
environment:
MILVUS_URL: standalone:19530
ports:
- "8082:3000"
restart: always
networks:
default:
name: milvus
Expected Behavior
I expect low CPU utilization on idle and fast responses.
Steps To Reproduce
No response
Milvus Log
Anything else?
This one is similar https://github.com/milvus-io/milvus/issues/22571
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 53 (29 by maintainers)
you can do so. this won’t change any data. Meanwhile you can wait for next milvus release and we will fix the issue