ClickHouse: SummingMergeTree with ORDER BY tuple() does not collapse rows
Describe what’s wrong
It is possible to create ReplacingMergeTree table without primary and sorting key using the ORDER BY tuple() clause.
In this case, ClickHouse stores data in the order of inserting.
It works for sequence of single row inserts. But if insert data using INSERT ... SELECT all row collapses into one.
I’m sure that this is not the way it supposed to work.
If you want to save data order when inserting data by INSERT … SELECT queries, set max_insert_threads = 1.
I understand that rows order is not guaranteed in case of INSERT ... SELECT
but i’m sure that it must not cause loss of data.
SummingMergeTree or AggregatingMergeTree works properly - all rows get inserted into table.
Does it reproduce on recent release?
Yes. clickhouse/clickhouse-server:22.8-alpine docker image
How to reproduce
CREATE TEMPORARY TABLE tmp_table ("id" Int64,"name" String) ;
insert into tmp_table values (1,'a');
insert into tmp_table values (2,'b');
insert into tmp_table values (3,'c');
CREATE TABLE test8 ("id" Int64,"name" String) ENGINE = ReplacingMergeTree ORDER BY tuple();
INSERT INTO test8 SELECT * from tmp_table;
select * from test8;
Expected behavior
1,a 2,b 3,c
Actual Result
3,c
Additional context
Everything works as expected with SummingMergeTree or AggregatingMergeTree
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 16 (12 by maintainers)
Also I suppose it makes sense for materialized views with
partition bykey to calculate some aggregate for each partition@canhld94
I expect
You may check it with
It’s not about sense, it’s about code generalization and documentation consistency. I knew about this unexpected behaviour for years, and I don’t care. But I prefer to document or fix all unexpected behaviours.
Hint: ORDER BY is really important for CH performance. It defines how data will be ordered and which columns will be included into PRIMARY INDEX. It improves insert/select performance drastically, because the right order improves compression rate and allows to write and read less data. https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/