dbt-clickhouse: Problem with the distributed_table materialization

Hello everyone, I am not 100% sure if this is a bug or because I did not correctly use the connector.

No same issue with table and view materializations.

Describe the bug

Here my table definition event.sql: it takes a parquet file from s3, then creates the table with the configurations.

{{ config(
    cluster="default",
    schema="db_test_dbt",
    materialized="distributed_table",
    sharding_key="rand()",
    unique_key="(event_id)",
    full_refresh=true,
    engine="ReplicatedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') ",
    settings={"allow_nullable_key": 1},
    order_by="(event_id)"
    )
}}

SELECT 
    * 
FROM 
    s3('path/to/parquet/file/on/s3/event/*', 'Parquet')

😊The first dbt run works well, the table is correctly created on my clickhouse cluster.

😭When I do dbt run again, it does not work any more:

Database Error in model event (models/event.sql)
    Code: 62.
    DB::Exception: Empty query. Stack trace:

The file target/run/dbt_project_name/models/event.sql is empty; The file target/compiled/dbt_project_name/models/event.sql contains the select * from s3 query;

Steps to reproduce

dbt run --select models/event.sql: Completed successfully.😊
dbt run --select models/event.sql again: ERROR!😭
dbt run --select models/event.sql again: SAME ERROR!😭
dbt run --select models/event.sql again: SAME ERROR!😭
drop the tables manually, and retry dbt run --select models/event.sql: Completed successfully. 😊

Configuration

Environment

dbt version: 1.4.6.
dbt-clickhouse version: 1.4.7 and same problem for previous versions.
clickhouse-driver version (if using native)
clickhouse-connect version (if using http): 1.4.7

ClickHouse server

ClickHouse Server version: 23.3.1
multiple shards * 2 replicates

Is it a bug? Or something under development? Or something wrongly set in my config clause?

Thanks in advance.

About this issue

Original URL
State: closed
Created a year ago
Comments: 15 (15 by maintainers)

Most upvoted comments

PR merged into 1.4.8 release.

genzgd on Aug 22, 2023

@zli06160 Yes, I’ve noticed this also when trying pytest with cluster setting. thus I changed this block to

  {%- if adapter.get_clickhouse_cluster_name() is not none and obj_types == 'TABLES' and 'Replicated' in engine_clause() %}
    {% do run_query("SYSTEM SYNC REPLICA " + on_cluster_clause() + target_relation.schema + '.' + target_relation.identifier) %}
  {%- endif %}

It seems like the assumption for the purpose of using a ClickHouse cluster within this repo is to replicate data (Replicated table engines).

gfunc on Aug 18, 2023

@zli06160 Can you possibly check to see if changing the macro as in the PR https://github.com/ClickHouse/dbt-clickhouse/pull/180 fixes your situation, since we don’t have tests? Some additional validation would be nice before we merge and release the fix.

genzgd on Aug 17, 2023

raised a merge request for this specific case. In the meanwhile creating test cases for distributed materializations.

gfunc on Aug 17, 2023

I will try to reproduce this with the NYC taxi data example. Looks like sth is broken before the actual SQL run.

gfunc on Aug 17, 2023

can you please provide dbt.log? The error was returned by ClickHouse, guess we need to know what exactly the query looks like.

gfunc on Aug 16, 2023

This is a new, experimental feature so I suspect it’s a bug. Any thoughts @gladkikhtutu?

genzgd on Aug 13, 2023