risingwave: bug: database memory surges when running a lot of small batch queries
Describe the bug
When starting a cluster service through docker-compose, create a simple table (only four very simple fields) in the database, and then write data, and find that the memory grows rapidly. When 300,000 data are written, the memory has grown 10G or so. And there is no response to the query on the table at this time.(select count(*) from t).
create table t (id int,v1 int,v2 int,s1 varchar);
sql = "insert into t1 values({},{},{},'test_value');"
for i in range(1,1000000):
sql = sql.format(i,i+1,i+2)
cursor = conn.cursor()
cursor.execute(sql)
To Reproduce
- docker compose -f docker-compose.yml up -d
- psql -h localhost -p 4566 -d dev -U root
- create table t (id int,v1 int,v2 int,s1 varchar);
- insert into t values({},{},{},‘test_value’);
this is docker-componse.yml
version: "3"
services:
compactor-0:
image: "ghcr.io/risingwavelabs/risingwave:v0.1.12"
command:
- compactor-node
- "--host"
- "0.0.0.0:6660"
- "--state-store"
- "hummock+minio://hummockadmin:hummockadmin@minio-0:9308/hummock001"
- "--meta-address"
- "http://meta-node-0:5690"
expose:
- "6660"
ports: []
depends_on:
- meta-node-0
- minio-0
environment:
RUST_BACKTRACE: "1"
container_name: compactor-0
compute-node-0:
image: "ghcr.io/risingwavelabs/risingwave:v0.1.12"
command:
- compute-node
- "--host"
- "0.0.0.0:5688"
- "--client-address"
- "compute-node-0:5688"
- "--state-store"
- "hummock+minio://hummockadmin:hummockadmin@minio-0:9308/hummock001"
- "--meta-address"
- "http://meta-node-0:5690"
expose:
- "5688"
ports: []
depends_on:
- meta-node-0
- minio-0
environment:
RUST_BACKTRACE: "1"
container_name: compute-node-0
etcd-0:
image: "quay.io/coreos/etcd:latest"
command:
- /usr/local/bin/etcd
- "--listen-client-urls"
- "http://0.0.0.0:2388"
- "--advertise-client-urls"
- "http://etcd-0:2388"
- "--listen-peer-urls"
- "http://0.0.0.0:2389"
- "--initial-advertise-peer-urls"
- "http://etcd-0:2389"
- "--listen-metrics-urls"
- "http://0.0.0.0:2379"
- "--name"
- risedev-meta
- "--max-txn-ops"
- "999999"
- "--auto-compaction-mode"
- periodic
- "--auto-compaction-retention"
- 1m
- "--snapshot-count"
- "10000"
expose:
- "2388"
ports:
- "2388:2388"
- "2389:2389"
depends_on: []
volumes:
- "/Users/wangdexin/mnt/etcd/etcd-0:/etcd-data"
environment: {}
container_name: etcd-0
frontend-node-0:
image: "ghcr.io/risingwavelabs/risingwave:v0.1.12"
command:
- frontend-node
- "--host"
- "0.0.0.0:4566"
- "--meta-addr"
- "http://meta-node-0:5690"
expose:
- "4566"
ports:
- "4566:4566"
depends_on:
- meta-node-0
environment:
RUST_BACKTRACE: "1"
container_name: frontend-node-0
meta-node-0:
image: "ghcr.io/risingwavelabs/risingwave:v0.1.12"
command:
- meta-node
- "--listen-addr"
- "0.0.0.0:5690"
- "--host"
- meta-node-0
- "--backend"
- etcd
- "--etcd-endpoints"
- "etcd-0:2388"
expose:
- "5690"
- "1250"
- "5691"
ports: []
depends_on: []
environment:
RUST_BACKTRACE: "1"
container_name: meta-node-0
minio-0:
image: "quay.io/minio/minio:latest"
command:
- server
- "--address"
- "0.0.0.0:9308"
- "--console-address"
- "0.0.0.0:9400"
- /data
expose:
- "9308"
- "9400"
ports:
- "9308:9308"
- "9400:9400"
depends_on: []
volumes:
- "/Users/wangdexin/mnt/minio/minio-0:/data"
entrypoint: "\n/bin/sh -c '\nset -e\nmkdir -p \"/data/hummock001\"\n/usr/bin/docker-entrypoint.sh \"$$0\" \"$$@\"\n'"
environment:
MINIO_CI_CD: "1"
MINIO_ROOT_PASSWORD: hummockadmin
MINIO_ROOT_USER: hummockadmin
container_name: minio-0
zookeeper:
image: confluentinc/cp-zookeeper:7.0.1
environment:
ZOOKEEPER_CLIENT_PORT: 2181
ZOOKEEPER_TICK_TIME: 2000
KAFKA_OPTS: "-Dzookeeper.4lw.commands.whitelist=ruok"
expose:
- "2181"
ports:
- "2181:2181"
healthcheck:
{
test: echo ruok | nc localhost 2181 || exit -1,
interval: 1s,
timeout: 5s,
retries: 3,
start_period: 120s
}
kafka:
image: wurstmeister/kafka
environment:
#KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
#KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://127.0.0.1:9092
#KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
#KAFKA_COMPRESSION_TYPE: lz4
#KAFKA_LOG_RETENTION_MS: 31104000000
KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://127.0.0.1:9091
KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://127.0.0.1:9091
KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
depends_on:
zookeeper: { condition: service_healthy }
expose:
- "9092"
ports:
- "9091:9091"
healthcheck:
{
test: nc -z localhost 9092,
interval: 5s,
start_period: 120s
}
volumes:
etcd-0:
external: false
minio-0:
external: false
name: risingwave-compose
Expected behavior
The memory grows quickly, and when more than 300,000 data is inserted, the query does not respond. select count(*) from t;
Additional context
This is the result whether I create a materialized view based on the table or not.
Oh, I glued it with indentation, and it was gone when the submission was completed. You can click edit and you can copy the indented yml.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (12 by maintainers)
BTW, though we can do some improvement to avoid soaring memory, it is not recommend to issue 300,000 sql, cuz there will cause 300, 000 batch task.
Better concat these insert values into one sql.
cc @BowenXiao1999 @ZENOTME Could you please help to take a look?
Sorry, I don’t use git very well. thanks
FE side should be fixed by #5822 and #5827. Closing the issue.
Before
After
The memory issue on CN seems to be solved by #5770
v0.1.9
5770
Seems like version 0.1.9, which I used at the time, didn’t happen.
Seems that per-task metrics (it doesn’t matter whether prometheus is enabled or not) is mainly responsible for the large memory usage on compute node.
Successfully reproduced with
risedev d. Looks like prometheus is taking up a large portion of memory.The whole adds up to about 10G,