ClickHouse: MaterializedPostgreSQL: possible segfault when table with incorrect structure
Describe what’s wrong
ClickHouse service crashes after I try to create a table using MaterializedPostgreSQL Engine.
I’m running version 23.9.1.1761
on a single node ubuntu server 22.04
How to reproduce Create a replication from a PostgreSQL database source to a Clickhouse DB as destination using MaterializedPostgreSQL engine with the following command:
CREATE TABLE slang_app.activities
(
id UInt64 CODEC(ZSTD),
created_at DateTime CODEC(ZSTD),
updated_at DateTime CODEC(ZSTD),
topic_id Int CODEC(ZSTD),
data String CODEC(ZSTD),
correct_answer String CODEC(ZSTD),
answer_map String CODEC(ZSTD),
primary_sense_ids String CODEC(ZSTD),
secondary_sense_ids String CODEC(ZSTD),
sentence_ids String CODEC(ZSTD),
sense_difficulty String CODEC(ZSTD),
sentence_difficulty String CODEC(ZSTD),
activity_hash String CODEC(ZSTD),
bucket Int CODEC(ZSTD),
key_term Bool CODEC(ZSTD),
cefr_level String CODEC(ZSTD),
sense_ids String CODEC(ZSTD),
active Bool CODEC(ZSTD),
activity_language String CODEC(ZSTD),
primary_grammar_pattern_ids UInt64 CODEC(ZSTD),
secondary_grammar_pattern_ids UInt64 CODEC(ZSTD),
interaction_info_id Int CODEC(ZSTD),
material_ids String CODEC(ZSTD)
)
ENGINE = MaterializedPostgreSQL('db_host:5432', 'db_name', 'activities', 'db_user', 'db_password')
PRIMARY KEY (id)
SETTINGS materialized_postgresql_tables_list = 'activities';
This is the source schema on postgres side:
id | bigint | | not null | nextval('activities_id_seq'::regclass) | plain | |
created_at | timestamp without time zone | | not null | | plain | |
updated_at | timestamp without time zone | | not null | | plain | |
topic_id | integer | | not null | | plain | |
data | jsonb | | not null | | extended | |
correct_answer | jsonb | | not null | | extended | |
answer_map | jsonb | | | | extended | |
primary_sense_ids | integer[] | | not null | | extended | |
secondary_sense_ids | integer[] | | | | extended | |
sentence_ids | integer[] | | | | extended | |
sense_difficulty | double precision | | | | plain | |
sentence_difficulty | double precision | | | | plain | |
activity_hash | character varying | | not null | | extended | |
bucket | integer | | not null | | plain | |
key_term | boolean | | not null | false | plain | |
cefr_level | cefr_level | | | | plain | |
sense_ids | integer[] | | not null | '{}'::integer[] | extended | |
active | boolean | | | true | plain | |
activity_language | character varying | | | | extended | |
primary_grammar_pattern_ids | bigint[] | | not null | '{}'::bigint[] | extended | |
secondary_grammar_pattern_ids | bigint[] | | not null | '{}'::bigint[] | extended | |
interaction_info_id | integer | | | | plain | |
material_ids | integer[] | | not null | '{}'::integer[] | extended | |
Expected behavior The table should be created and replication done. I have this setup for several other tables from this same source database and it works, but not for this one specifically.
Error message and/or stacktrace As soon as I run the command I instantaneously get:
Query id: 02b01da9-cb3e-4b76-8376-23d0322755a8
Exception on client:
Code: 32. DB::Exception: Attempt to read after eof: while receiving packet from localhost:9000. (ATTEMPT_TO_READ_AFTER_EOF)
There were no stacktraces in the logs, only these: clickhouse-server.log
2023.10.09 13:58:13.500286 [ 379896 ] {b84057e6-6d65-4117-8f97-dc3edc54f5fb} <Debug> StorageMaterializedPostgreSQL(slangapp.activities): Creating clickhouse table for postgresql table slang_app.activities (381720fd-a661-4253-a906-9fa03feb67b1)
2023.10.09 13:58:13.506057 [ 379896 ] {b84057e6-6d65-4117-8f97-dc3edc54f5fb} <Debug> slang_app.`381720fd-a661-4253-a906-9fa03feb67b1_nested` (e32a86e4-063f-46c5-b706-85d7b1689e8e): Loading data parts
2023.10.09 13:58:13.506289 [ 379896 ] {b84057e6-6d65-4117-8f97-dc3edc54f5fb} <Debug> slang_app.`381720fd-a661-4253-a906-9fa03feb67b1_nested` (e32a86e4-063f-46c5-b706-85d7b1689e8e): There are no data parts
2023.10.09 13:58:45.329586 [ 385637 ] {} <Trace> LoadingDeps: Table slang_app.`381720fd-a661-4253-a906-9fa03feb67b1_nested` has no dependencies (level 0)
2023.10.09 13:58:47.800505 [ 385894 ] {} <Debug> slang_app.`381720fd-a661-4253-a906-9fa03feb67b1_nested` (e32a86e4-063f-46c5-b706-85d7b1689e8e): Loading data parts
2023.10.09 13:58:47.800633 [ 385894 ] {} <Debug> slang_app.`381720fd-a661-4253-a906-9fa03feb67b1_nested` (e32a86e4-063f-46c5-b706-85d7b1689e8e): There are no data parts
clickhouse-server.err.log
2023.10.09 13:58:13.951739 [ 374044 ] {} <Fatal> Application: Child process was terminated by signal 11.
2023.10.09 13:58:44.511311 [ 385637 ] {} <Warning> Application: Integrity check of the executable skipped because the reference checksum could not be read.
About this issue
- Original URL
- State: closed
- Created 9 months ago
- Comments: 26 (12 by maintainers)
Segmentation fault is never expected to happen 😃 But yes, the schema change could be the reason. Maybe @kssenii has some ideas
Reproduced and confirm that this segfault with incorrect structure is fixed in https://github.com/ClickHouse/ClickHouse/pull/57567.
About the segfault with inserting incorrect data type, I have a strong impression that it is the same issue as in https://github.com/ClickHouse/ClickHouse/issues/54297 and it is fixed in https://github.com/ClickHouse/ClickHouse/pull/57567 (not merged yet). Will make a test to verify that in a moment.
It’s because of https://clickhouse.com/docs/en/operations/optimizing-performance/sampling-query-profiler
You can disable stopping on those signals:
Sure, let me try it!