ClickHouse: Inconsistent query results
You have to provide the following information whenever possible.
We are intermittently getting inconsistent query results when running a query with a distributed subquery. After getting user reports, I wrote a script to run a query 1000 times against each node in our cluster. I stored the results in a dict, with the key being the return value from the query, and the value being the number of times that result was seen.
Host clickhouse-green-counters-0 {109334: 1000}
Host clickhouse-green-counters-1 {108676: 27, 109248: 5, 109334: 968}
Host clickhouse-green-counters-2 {109334: 1000}
Host clickhouse-green-counters-3 {108243: 14, 109100: 2, 109334: 984}
Host clickhouse-green-counters-4 {109334: 1000}
Host clickhouse-green-counters-5 {109334: 1000}
Host clickhouse-green-counters-6 {109334: 1000}
When run a second time, all results are correct. I ran it 6 times, all results were consistent after the first run. However, after waiting over night I ran it again and again got incorrect results. I validated replica status was OK for all nodes, the largest absolute delay was 1.
We then made a comparable query and ran it on a separate cluster and were able to replicate this behavior on a different cluster.
If possible, change “enabled” to true in “send_crash_reports” section in
config.xml
: It does not crash or error, only produces wrong data.
How to reproduce
-
Which ClickHouse server version to use ClickHouse server version 22.8.1.
-
Which interface to use, if matters Using python 3.6 clickhouse-driver.
-
Non-default settings, if any
<yandex>
<merge_tree>
<use_minimalistic_part_header_in_zookeeper>1</use_minimalistic_part_header_in_zookeeper>
<enable_mixed_granularity_parts>1</enable_mixed_granularity_parts>
<number_of_free_entries_in_pool_to_lower_max_size_of_merge>12</number_of_free_entries_in_pool_to_lower_max_size_of_merge>
</merge_tree>
<compression>
<case>
<min_part_size>10000000000</min_part_size>
<min_part_size_ratio>0.01</min_part_size_ratio>
<method>zstd</method>
</case>
</compression>
CREATE TABLE
statements for all tables involved
CREATE TABLE default.my_table_name
(
`workspaceID` String,
`sourceID` String,
`hour` DateTime,
`attribute` String
)
ENGINE = Distributed('my_cluster', '', 'my_table_name', rand())
- Sample data for all these tables, use [clickhouse-obfuscator]
┌─workspaceID─┬─sourceID───┬────────────────hour─┬─attribute──────────────┐
│ obfuscated │ aaaaaaaa │ 2022-10-01 07:00:00 │ attribute1 │
│ obfuscated │ aaaaaaaa │ 2022-10-01 07:00:00 │ attribute2 │
│ obfuscated │ aaaaaaaa │ 2022-10-01 07:00:00 │ attribute3 │
│ obfuscated │ aaaaaaaa │ 2022-10-01 07:00:00 │ attribute4 │
│ obfuscated │ aaaaaaaa │ 2022-10-01 07:00:00 │ attribute5 │
│ obfuscated │ bbbbbbbb │ 2022-10-01 07:00:00 │ attribute6 │
│ obfuscated │ bbbbbbbb │ 2022-10-01 07:00:00 │ attribute7 │
│ obfuscated │ bbbbbbbb │ 2022-10-01 07:00:00 │ attribute8 │
│ obfuscated │ bbbbbbbb │ 2022-10-01 07:00:00 │ attribute9 │
│ obfuscated │ bbbbbbbb │ 2022-10-01 07:00:00 │ attribute10 │
└─────────────┴────────────┴─────────────────────┴───────────────────────┘
- Queries to run that lead to unexpected result
SELECT
sum(att_count) AS total
FROM
(
SELECT
sourceID,
count(distinct attribute) as att_count
FROM my_table_name
WHERE (hour = '2022-10-01 07:00:00') AND (workspaceID = 'obfuscated')
GROUP BY sourceID
);
Expected behavior
The same sql return the same number each time.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 20 (20 by maintainers)
try
collect_hash_table_stats_during_aggregation=0