ClickHouse: SummingMergeTree with ORDER BY tuple() does not collapse rows

Describe what’s wrong

It is possible to create ReplacingMergeTree table without primary and sorting key using the ORDER BY tuple() clause. In this case, ClickHouse stores data in the order of inserting. It works for sequence of single row inserts. But if insert data using INSERT ... SELECT all row collapses into one. I’m sure that this is not the way it supposed to work.

If you want to save data order when inserting data by INSERT … SELECT queries, set max_insert_threads = 1.

I understand that rows order is not guaranteed in case of INSERT ... SELECT but i’m sure that it must not cause loss of data.

SummingMergeTree or AggregatingMergeTree works properly - all rows get inserted into table.

Does it reproduce on recent release?

Yes. clickhouse/clickhouse-server:22.8-alpine docker image

How to reproduce

CREATE TEMPORARY TABLE tmp_table  ("id" Int64,"name" String) ;
insert into tmp_table values (1,'a');
insert into tmp_table values (2,'b');
insert into tmp_table values (3,'c');

CREATE TABLE test8 ("id" Int64,"name" String) ENGINE = ReplacingMergeTree  ORDER BY tuple();
INSERT INTO test8 SELECT * from tmp_table;

select * from test8;

Expected behavior

1,a 2,b 3,c

Actual Result

3,c

Additional context

Everything works as expected with SummingMergeTree or AggregatingMergeTree

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 16 (12 by maintainers)

Most upvoted comments

Nevertheless, I feel it’s doesn’t make any sense to have a SummingMergeTree with ORDER BY tuple()

Also I suppose it makes sense for materialized views with partition by key to calculate some aggregate for each partition

@canhld94

I expect

 ┌─id─┬─name─┬─x─┐
 │  6 │ с    │ 3 │ name=any(name) a or b or c
 └────┴──────┴───┘

You may check it with

CREATE TABLE test8 (artificial_key Int8, "id" Int64,"name" String, x Int64) E
NGINE = SummingMergeTree  ORDER BY artificial_key 
as select * from values ((1, 1,'a', 1), (1, 2,'b',1), (1, 3,'c',1));

Nevertheless, I feel it’s doesn’t make any sense to have a SummingMergeTree with ORDER BY tuple(). There’s also another option to prevent creating tables like that.

It’s not about sense, it’s about code generalization and documentation consistency. I knew about this unexpected behaviour for years, and I don’t care. But I prefer to document or fix all unexpected behaviours.

Hint: ORDER BY is really important for CH performance. It defines how data will be ordered and which columns will be included into PRIMARY INDEX. It improves insert/select performance drastically, because the right order improves compression rate and allows to write and read less data. https://kb.altinity.com/engines/mergetree-table-engine-family/pick-keys/