risingwave: bug: database memory surges when running a lot of small batch queries

Describe the bug

When starting a cluster service through docker-compose, create a simple table (only four very simple fields) in the database, and then write data, and find that the memory grows rapidly. When 300,000 data are written, the memory has grown 10G or so. And there is no response to the query on the table at this time.(select count(*) from t).

create table t (id int,v1 int,v2 int,s1 varchar);

sql = "insert into t1 values({},{},{},'test_value');"
for i in range(1,1000000):
    sql = sql.format(i,i+1,i+2)
    cursor = conn.cursor()
    cursor.execute(sql)

To Reproduce

docker compose -f docker-compose.yml up -d
psql -h localhost -p 4566 -d dev -U root
create table t (id int,v1 int,v2 int,s1 varchar);
insert into t values({},{},{},‘test_value’);

this is docker-componse.yml

version: "3"
services:
  compactor-0:
    image: "ghcr.io/risingwavelabs/risingwave:v0.1.12"
    command:
      - compactor-node
      - "--host"
      - "0.0.0.0:6660"
      - "--state-store"
      - "hummock+minio://hummockadmin:hummockadmin@minio-0:9308/hummock001"
      - "--meta-address"
      - "http://meta-node-0:5690"
    expose:
      - "6660"
    ports: []
    depends_on:
      - meta-node-0
      - minio-0
    environment:
      RUST_BACKTRACE: "1"
    container_name: compactor-0
    
  compute-node-0:
    image: "ghcr.io/risingwavelabs/risingwave:v0.1.12"
    command:
      - compute-node
      - "--host"
      - "0.0.0.0:5688"
      - "--client-address"
      - "compute-node-0:5688"
      - "--state-store"
      - "hummock+minio://hummockadmin:hummockadmin@minio-0:9308/hummock001"
      - "--meta-address"
      - "http://meta-node-0:5690"
    expose:
      - "5688"
    ports: []
    depends_on:
      - meta-node-0
      - minio-0
    environment:
      RUST_BACKTRACE: "1"
    container_name: compute-node-0
  etcd-0:
    image: "quay.io/coreos/etcd:latest"
    command:
      - /usr/local/bin/etcd
      - "--listen-client-urls"
      - "http://0.0.0.0:2388"
      - "--advertise-client-urls"
      - "http://etcd-0:2388"
      - "--listen-peer-urls"
      - "http://0.0.0.0:2389"
      - "--initial-advertise-peer-urls"
      - "http://etcd-0:2389"
      - "--listen-metrics-urls"
      - "http://0.0.0.0:2379"
      - "--name"
      - risedev-meta
      - "--max-txn-ops"
      - "999999"
      - "--auto-compaction-mode"
      - periodic
      - "--auto-compaction-retention"
      - 1m
      - "--snapshot-count"
      - "10000"
    expose:
      - "2388"
    ports:
      - "2388:2388"
      - "2389:2389"
    depends_on: []
    volumes:
      - "/Users/wangdexin/mnt/etcd/etcd-0:/etcd-data"
    environment: {}
    container_name: etcd-0
  frontend-node-0:
    image: "ghcr.io/risingwavelabs/risingwave:v0.1.12"
    command:
      - frontend-node
      - "--host"
      - "0.0.0.0:4566"
      - "--meta-addr"
      - "http://meta-node-0:5690"
    expose:
      - "4566"
    ports:
      - "4566:4566"
    depends_on:
      - meta-node-0
    environment:
      RUST_BACKTRACE: "1"
    container_name: frontend-node-0
    
  
  meta-node-0:
    image: "ghcr.io/risingwavelabs/risingwave:v0.1.12"
    command:
      - meta-node
      - "--listen-addr"
      - "0.0.0.0:5690"
      - "--host"
      - meta-node-0
      - "--backend"
      - etcd
      - "--etcd-endpoints"
      - "etcd-0:2388"
    expose:
      - "5690"
      - "1250"
      - "5691"
    ports: []
    depends_on: []
    environment:
      RUST_BACKTRACE: "1"
    container_name: meta-node-0
  
  minio-0:
    image: "quay.io/minio/minio:latest"
    command:
      - server
      - "--address"
      - "0.0.0.0:9308"
      - "--console-address"
      - "0.0.0.0:9400"
      - /data
    expose:
      - "9308"
      - "9400"
    ports:
      - "9308:9308"
      - "9400:9400"
    depends_on: []
    volumes:
      - "/Users/wangdexin/mnt/minio/minio-0:/data"
    entrypoint: "\n/bin/sh -c '\nset -e\nmkdir -p \"/data/hummock001\"\n/usr/bin/docker-entrypoint.sh \"$$0\" \"$$@\"\n'"
    environment:
      MINIO_CI_CD: "1"
      MINIO_ROOT_PASSWORD: hummockadmin
      MINIO_ROOT_USER: hummockadmin
    container_name: minio-0

  zookeeper:
    image: confluentinc/cp-zookeeper:7.0.1
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
      KAFKA_OPTS: "-Dzookeeper.4lw.commands.whitelist=ruok"
    expose:
      - "2181"
    ports:
      - "2181:2181"
    healthcheck:
      {
        test: echo ruok | nc localhost 2181 || exit -1,
        interval: 1s,
        timeout: 5s,
        retries: 3,
        start_period: 120s
      }

  kafka:
    image: wurstmeister/kafka
    environment:
      #KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      #KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://127.0.0.1:9092
      #KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      #KAFKA_COMPRESSION_TYPE: lz4
      #KAFKA_LOG_RETENTION_MS: 31104000000
      KAFKA_ZOOKEEPER_CONNECT: zookeeper:2181
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: INSIDE:PLAINTEXT,OUTSIDE:PLAINTEXT
      KAFKA_LISTENERS: INSIDE://:9092,OUTSIDE://127.0.0.1:9091
      KAFKA_ADVERTISED_LISTENERS: INSIDE://:9092,OUTSIDE://127.0.0.1:9091
      KAFKA_INTER_BROKER_LISTENER_NAME: INSIDE
    depends_on:
      zookeeper: { condition: service_healthy }
    expose:
      - "9092"
    ports:
      - "9091:9091"
    healthcheck:
      {
        test: nc -z localhost 9092,
        interval: 5s,
        start_period: 120s
      }



volumes:
  etcd-0:
    external: false
  minio-0:
    external: false

name: risingwave-compose

Expected behavior

The memory grows quickly, and when more than 300,000 data is inserted, the query does not respond. select count(*) from t;

Additional context

This is the result whether I create a materialized view based on the table or not.

Oh, I glued it with indentation, and it was gone when the submission was completed. You can click edit and you can copy the indented yml.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 20 (12 by maintainers)

Most upvoted comments

BTW, though we can do some improvement to avoid soaring memory, it is not recommend to issue 300,000 sql, cuz there will cause 300, 000 batch task.

Better concat these insert values into one sql.

BowenXiao1999 on Oct 11, 2022

cc @BowenXiao1999 @ZENOTME Could you please help to take a look?

BugenZhao on Oct 9, 2022

wangdexinhp on Oct 9, 2022

BTW, could you put the yml in a code snippet so that we can copy & paste it more conveniently?

yes, I have written the yml file, in the middle of Describe the bug

I mean there’s no indentation so whoever needs to reuse it needs to format it themselves🤣.

Oh, I glued it with indentation, and it was gone when the submission was completed. You can click edit and you can copy the indented yml.

I’ve edited the comment for you.

Sorry, I don’t use git very well. thanks

wangdexinhp on Oct 11, 2022

FE side should be fixed by #5822 and #5827. Closing the issue.

Before

After

Gun9niR on Oct 17, 2022

The memory issue on CN seems to be solved by #5770

v0.1.9

5770

Gun9niR on Oct 11, 2022

And the original version doesn’t seem to have such an exaggerated memory growth

Which version is that?

I’ve yet to investigate the frontend node. I guess our per-task metric is not suitable in this scenario. @ZENOTME any ideas for improvement?

Seems like version 0.1.9, which I used at the time, didn’t happen.

wangdexinhp on Oct 9, 2022

Seems that per-task metrics (it doesn’t matter whether prometheus is enabled or not) is mainly responsible for the large memory usage on compute node.

Gun9niR on Oct 9, 2022

Successfully reproduced with risedev d. Looks like prometheus is taking up a large portion of memory.

Gun9niR on Oct 9, 2022

Which component has reached 10G memory usage? Compute node or compactor?

The whole adds up to about 10G，

wangdexinhp on Oct 11, 2022