dbt-clickhouse: Problem with the distributed_table materialization

Hello everyone, I am not 100% sure if this is a bug or because I did not correctly use the connector.

  • No same issue with table and view materializations.

Describe the bug

Here my table definition event.sql: it takes a parquet file from s3, then creates the table with the configurations.

{{ config(
    cluster="default",
    schema="db_test_dbt",
    materialized="distributed_table",
    sharding_key="rand()",
    unique_key="(event_id)",
    full_refresh=true,
    engine="ReplicatedReplacingMergeTree('/clickhouse/tables/{uuid}/{shard}', '{replica}') ",
    settings={"allow_nullable_key": 1},
    order_by="(event_id)"
    )
}}

SELECT 
    * 
FROM 
    s3('path/to/parquet/file/on/s3/event/*', 'Parquet')

😊The first dbt run works well, the table is correctly created on my clickhouse cluster.

😭When I do dbt run again, it does not work any more:

Database Error in model event (models/event.sql)
    Code: 62.
    DB::Exception: Empty query. Stack trace:

The file target/run/dbt_project_name/models/event.sql is empty; The file target/compiled/dbt_project_name/models/event.sql contains the select * from s3 query;

Steps to reproduce

  1. dbt run --select models/event.sql: Completed successfully.😊
  2. dbt run --select models/event.sql again: ERROR!😭
  3. dbt run --select models/event.sql again: SAME ERROR!😭
  4. dbt run --select models/event.sql again: SAME ERROR!😭
  5. drop the tables manually, and retry dbt run --select models/event.sql: Completed successfully. 😊

Configuration

Environment

  • dbt version: 1.4.6.
  • dbt-clickhouse version: 1.4.7 and same problem for previous versions.
  • clickhouse-driver version (if using native)
  • clickhouse-connect version (if using http): 1.4.7

ClickHouse server

  • ClickHouse Server version: 23.3.1
  • multiple shards * 2 replicates

Is it a bug? Or something under development? Or something wrongly set in my config clause?

Thanks in advance.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (15 by maintainers)

Most upvoted comments

PR merged into 1.4.8 release.

@zli06160 Yes, I’ve noticed this also when trying pytest with cluster setting. thus I changed this block to

  {%- if adapter.get_clickhouse_cluster_name() is not none and obj_types == 'TABLES' and 'Replicated' in engine_clause() %}
    {% do run_query("SYSTEM SYNC REPLICA " + on_cluster_clause() + target_relation.schema + '.' + target_relation.identifier) %}
  {%- endif %}

It seems like the assumption for the purpose of using a ClickHouse cluster within this repo is to replicate data (Replicated table engines).

@zli06160 Can you possibly check to see if changing the macro as in the PR https://github.com/ClickHouse/dbt-clickhouse/pull/180 fixes your situation, since we don’t have tests? Some additional validation would be nice before we merge and release the fix.

raised a merge request for this specific case. In the meanwhile creating test cases for distributed materializations.

I will try to reproduce this with the NYC taxi data example. Looks like sth is broken before the actual SQL run.

can you please provide dbt.log? The error was returned by ClickHouse, guess we need to know what exactly the query looks like.

This is a new, experimental feature so I suspect it’s a bug. Any thoughts @gladkikhtutu?