ClickHouse: Index analysis doesn't work with toHour and Timezone condition

Describe what’s wrong Filter condition doesn’t work correctly when functions like toHour are involved, I have another examples involved toDayOfWeek function as well where produced results are incorrect but I’ll report in separate ticket about it.

Does it reproduce on recent release? Yes, verified as part of v22.3-lts

How to reproduce

  • Which ClickHouse server version to use 21.12.4.1
  • Which interface to use, if matters: clickhouse-client
  • Queries to run that lead to unexpected result and
Table definition
clickhouse-db-02.server.internal :) show create db.articles_ext_data;

SHOW CREATE TABLE db.articles_ext_data

Query id: 383c56fa-21e0-4dae-bc78-9eb9adfd03b2

[clickhouse-db-02.server.internal] 2022.06.23 08:19:47.831837 [ 65324 ] {383c56fa-21e0-4dae-bc78-9eb9adfd03b2} <Debug> executeQuery: (from 127.0.0.1:43080) show create db.articles_ext_data;
[clickhouse-db-02.server.internal] 2022.06.23 08:19:47.831935 [ 65324 ] {383c56fa-21e0-4dae-bc78-9eb9adfd03b2} <Trace> ContextAccess (default): Access granted: SHOW COLUMNS ON db.articles_ext_data
┌─statement──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ CREATE TABLE db.articles_ext_data
(
    `internal_id` String,
    `timestamp` Nullable(DateTime('UTC')),
    `url` Nullable(String),
    `data_provider` String,
    `document_length` UInt32,
    `domain_name` String,
    `is_near_duplicate` UInt8,
    `publish_date` DateTime('UTC'),
    `lang` Nullable(String),
    `frames.label` Array(String),
    `frames.score` Array(Float64),
    `frames.version` Array(UInt32),
    `frames.role` Array(Array(String)),
    `frames.value` Array(Array(String)),
    `frames.entity_id` Array(Array(UInt32)),
    `frames.salience_score` Array(Array(Float64)),
    `tags.id` Array(UInt32),
    `frames.num_mentions` Array(UInt32),
    `tags.name` Array(String),
    `tags.score` Array(Float64),
    `tags.tagger` Array(String),
    `tags.checksum` Array(String),
    `tags.type` Array(String),
    `kpis.entity_id` Array(UInt32),
    `kpis.salience_score` Array(Float64),
    `kpis.num_mentions` Array(UInt32)
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/replicated/db/articles_ext_data', 'clickhouse-db-02.server.internal')
PARTITION BY toYYYYMMDD(publish_date)
PRIMARY KEY cityHash64(internal_id)
ORDER BY cityHash64(internal_id)
SAMPLE BY cityHash64(internal_id)
SETTINGS index_granularity = 8192 │
└────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
[clickhouse-db-02.server.internal] 2022.06.23 08:19:47.833999 [ 65324 ] {383c56fa-21e0-4dae-bc78-9eb9adfd03b2} <Information> executeQuery: Read 1 rows, 1.22 KiB in 0.002124662 sec., 470 rows/sec., 575.46 KiB/sec.
[clickhouse-db-02.server.internal] 2022.06.23 08:19:47.834029 [ 65324 ] {383c56fa-21e0-4dae-bc78-9eb9adfd03b2} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

1 rows in set. Elapsed: 0.003 sec.
# Working Use Case
SELECT
    toHour(toTimeZone(publish_date, 'UTC')) AS toHour_UTC,
    toHour(toTimeZone(publish_date, 'Asia/Jerusalem')) AS toHour_Israel
FROM db.articles_ext_data
WHERE (publish_date >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (publish_date < toTimeZone(toDateTime('2021-09-30 23:59:59'), 'Asia/Jerusalem'))
HAVING toHour_UTC = 5
ORDER BY toHour_UTC DESC
LIMIT 10

Query id: 9032228d-a5ae-465b-985d-6cb5d8369ec8

[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.827400 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> executeQuery: (from 127.0.0.1:43080) SELECT toHour(toTimeZone(publish_date, 'UTC')) AS toHour_UTC, toHour(toTimeZone(publish_date, 'Asia/Jerusalem')) AS toHour_Israel FROM db.articles_ext_data WHERE (publish_date >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (publish_date < toTimeZone(toDateTime('2021-09-30 23:59:59'), 'Asia/Jerusalem')) HAVING toHour_UTC = 5 ORDER BY toHour_UTC DESC LIMIT 10;
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.828408 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Trace> ContextAccess (default): Access granted: SELECT(publish_date) ON db.articles_ext_data
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.829203 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.829851 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Key condition: unknown, unknown, and, unknown, and
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.841597 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): MinMax index condition: (column 0 in [1625097600, +Inf)), (column 0 in (-Inf, 1633046398]), and, (toHour(toTimezone(column 0)) in [5, 5]), and
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.843542 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Selected 117/3088 parts by partition key, 117 parts by primary key, 5858/5858 marks by primary key, 5858 marks to read from 117 ranges
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.845501 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Reading approx. 6585652 rows with 32 streams
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.853734 [ 53567 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> MergingSortedTransform: Merge sorted 1 blocks, 10 rows in 0.007999958 sec., 1250.0065625344535 rows/sec., 15.63 KiB/sec
┌─toHour_UTC─┬─toHour_Israel─┐
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
│          5 │             8 │
└────────────┴───────────────┘
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.854394 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Information> executeQuery: Read 6585652 rows, 25.12 MiB in 0.026937251 sec., 244481220 rows/sec., 932.62 MiB/sec.
[clickhouse-db-02.server.internal] 2022.06.23 08:18:11.854421 [ 65324 ] {9032228d-a5ae-465b-985d-6cb5d8369ec8} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.

10 rows in set. Elapsed: 0.028 sec. Processed 6.59 million rows, 26.34 MB (235.49 million rows/s., 941.98 MB/s.)

------------------------------------------------------------------------------------------------------------------------------------------------------------------------

# Not Working Use Case
SELECT
    toHour(toTimeZone(publish_date, 'UTC')) AS toHour_UTC,
    toHour(toTimeZone(publish_date, 'Asia/Jerusalem')) AS toHour_Israel
FROM db.articles_ext_data
WHERE (publish_date >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (publish_date < toTimeZone(toDateTime('2021-09-30 23:59:59'), 'Asia/Jerusalem'))
HAVING toHour_Israel = 8
ORDER BY toHour_Israel DESC
LIMIT 10

Query id: c1c83e59-af83-40cc-b93e-d2a774186fa1

[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.523436 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> executeQuery: (from 127.0.0.1:43080) SELECT toHour(toTimeZone(publish_date, 'UTC')) AS toHour_UTC, toHour(toTimeZone(publish_date, 'Asia/Jerusalem')) AS toHour_Israel FROM db.articles_ext_data WHERE (publish_date >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (publish_date < toTimeZone(toDateTime('2021-09-30 23:59:59'), 'Asia/Jerusalem')) HAVING toHour_Israel = 8 ORDER BY toHour_Israel DESC LIMIT 10
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.524450 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Trace> ContextAccess (default): Access granted: SELECT(publish_date) ON db.articles_ext_data
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.525283 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Trace> InterpreterSelectQuery: FetchColumns -> Complete
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.526006 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Key condition: unknown, unknown, and, unknown, and
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.537831 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): MinMax index condition: (column 0 in [1625097600, +Inf)), (column 0 in (-Inf, 1633046398]), and, (toHour(toTimezone(column 0)) in [8, 8]), and
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.537893 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> db.articles_ext_data (bf32f1f5-ce1c-44e3-bf32-f1f5ce1c24e3) (SelectExecutor): Selected 0/3088 parts by partition key, 0 parts by primary key, 0/0 marks by primary key, 0 marks to read from 0 ranges
[clickhouse-db-02.server.internal] 2022.06.23 08:18:34.538562 [ 65324 ] {c1c83e59-af83-40cc-b93e-d2a774186fa1} <Debug> MemoryTracker: Peak memory usage (for query): 0.00 B.
Ok.

0 rows in set. Elapsed: 0.016 sec.

Expected behavior The expected results should be identical to the first query response regardless of condition toHour_UTC = 5 or toHour_Israel = 8 applies

Additional context On small tables with syntenic data, I was unable to reproduce the issue, I also tried to copy data from one table to another with the same structure and after data migration completion the issue reproduces consistently. Might be related to https://github.com/ClickHouse/ClickHouse/issues/10977

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 18 (9 by maintainers)

Most upvoted comments

no issue in 21.8.13.6

toHour() monotonicity is broken from the very beginning. The result why it works is because we don’t support toTimeZone(t, 'Asia/Jerusalem') in index analysis in 21.8

repro:

drop table if exists test_tz_hour;
CREATE TABLE test_tz_hour( t DateTime,  x String ) ENGINE = MergeTree PARTITION BY toYYYYMMDD(t) ORDER BY x;

insert into  test_tz_hour select  toDateTime('2021-06-01 00:00:00') + number * 600, 'x' from numbers(1e4);

select count(), toHour_UTC, toHour_Israel from (
SELECT
    toHour(toTimeZone(t, 'UTC')) AS toHour_UTC,
    toHour(toTimeZone(t, 'Asia/Jerusalem')) AS toHour_Israel
FROM test_tz_hour
WHERE (t >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (t < toTimeZone(toDateTime('2021-07-31 23:59:59'), 'Asia/Jerusalem'))
HAVING toHour_UTC = 5
ORDER BY toHour_UTC DESC)
group by toHour_UTC, toHour_Israel

┌─count()─┬─toHour_UTC─┬─toHour_Israel─┐
│     186 │          5 │             8 │
└─────────┴────────────┴───────────────┘

select count(), toHour_UTC, toHour_Israel from (
SELECT
    toHour(toTimeZone(t, 'UTC')) AS toHour_UTC,
    toHour(toTimeZone(t, 'Asia/Jerusalem')) AS toHour_Israel
FROM test_tz_hour
WHERE (t >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (t < toTimeZone(toDateTime('2021-07-31 23:59:59'), 'Asia/Jerusalem'))
HAVING (toHour_Israel) = 8
ORDER BY toHour_UTC DESC)
group by toHour_UTC, toHour_Israel

0 rows in set. Elapsed: 0.007 sec.
  • HAVING materialize(toHour_Israel) = 8 / HAVING identity(toHour_Israel) = 8
select count(), toHour_UTC, toHour_Israel from (
SELECT
    toHour(toTimeZone(t, 'UTC')) AS toHour_UTC,
    toHour(toTimeZone(t, 'Asia/Jerusalem')) AS toHour_Israel
FROM test_tz_hour
WHERE (t >= toTimeZone(toDateTime('2021-07-01 00:00:00'), 'Asia/Jerusalem')) AND (t < toTimeZone(toDateTime('2021-07-31 23:59:59'), 'Asia/Jerusalem'))
HAVING materialize(toHour_Israel) = 8
ORDER BY toHour_UTC DESC)
group by toHour_UTC, toHour_Israel
┌─count()─┬─toHour_UTC─┬─toHour_Israel─┐
│     186 │          5 │             8 │
└─────────┴────────────┴───────────────┘