ClickHouse: Preallocation into hash table slows down query for small table

Describe the situation Degraded performance of window functions since 22.8

How to reproduce

echo "select * from (select number, count(), dense_rank() over (order by number asc) rank from numbers(100000) group by number) where rank < 100" | clickhouse-benchmark -c 10 --max_threads=8

versions <= 22.7

localhost:9000, queries 3976, QPS: 442.505, RPS: 57972517.711, MiB/s: 442.295, result RPS: 43807.948, result MiB/s: 1.003.
...
95.000%		0.028 sec.
...

versions >= 22.8

localhost:9000, queries 202, QPS: 204.984, RPS: 26854915.343, MiB/s: 204.887, result RPS: 20293.387, result MiB/s: 0.464.
...
95.000%		0.056 sec.
...

Additional context

<= 22.7 explain

┌─explain───────────────────────────────────────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))                                               │
│   Filter ((WHERE + (Projection + Before ORDER BY)))                                       │
│     Window (Window step for window 'ORDER BY number ASC')                                 │
│       MergingSorted (Merge sorted streams for window 'ORDER BY number ASC')               │
│         MergeSorting (Merge sorted blocks for window 'ORDER BY number ASC')               │
│           PartialSorting (Sort each block for window 'ORDER BY number ASC')               │
│             Expression (Before window functions)                                          │
│               Aggregating                                                                 │
│                 Expression (Before GROUP BY)                                              │
│                   SettingQuotaAndLimits (Set limits and quota after reading from storage) │
│                     ReadFromStorage (SystemNumbers)                                       │
└───────────────────────────────────────────────────────────────────────────────────────────┘

= 22.8 explain

┌─explain───────────────────────────────────────────────────┐
│ Expression ((Projection + Before ORDER BY))               │
│   Filter ((WHERE + (Projection + Before ORDER BY)))       │
│     Window (Window step for window 'ORDER BY number ASC') │
│       Sorting (Sorting for window 'ORDER BY number ASC')  │
│         Expression (Before window functions)              │
│           Aggregating                                     │
│             Expression (Before GROUP BY)                  │
│               ReadFromStorage (SystemNumbers)             │
└───────────────────────────────────────────────────────────┘

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 21 (10 by maintainers)

Most upvoted comments

@nickitat I’ve provided a similar one

docker run --rm -it -d --name c228 clickhouse/clickhouse-server:22.8
docker run --rm -it -d --name c219 clickhouse/clickhouse-server:21.9

docker exec -it c228 bash -c 'echo "select number, count() from numbers(100000) group by number format Null" | clickhouse-benchmark -c 10 --max_threads=8'
docker exec -it c219 bash -c 'echo "select number, count() from numbers(100000) group by number format Null" | clickhouse-benchmark -c 10 --max_threads=8'

Please note that the number of threads here is 8 (not single threaded scenario) and the QPS is different x5 times (~100 vs ~500 on my local)

aggr over numbers is single threaded regardless of max_threads value. it is why we have numbers and numbers_mt. but you right that in multithreaded case we could also see a slowdown for small tables. I also addressed this issue in the linked pr.

@nickitat I’ll help locate the issue, got a reproducible local setup


docker run --rm -it -d --name c228 clickhouse/clickhouse-server:22.8
docker run --rm -it -d --name c219 clickhouse/clickhouse-server:21.9

docker exec -it c228 bash -c 'echo "select number, count() from numbers(100000) group by number format Null" | clickhouse-benchmark -c 10 --max_threads=8'
docker exec -it c219 bash -c 'echo "select number, count() from numbers(100000) group by number format Null" | clickhouse-benchmark -c 10 --max_threads=8'

(QPS is 100 in 22.8 and 500 in 21.9 on my local)