ClickHouse: Segmentation fault when a query to a distributed table is merging data from shards

19.3.3.26 19.9.3.31 The query groups data by a String column. The initiator crashes when it is merging results from the shards.

2019.08.23 12:43:31.685382 [ 50 ] {e1f857f8-785f-4cbd-a536-32a2b799ff4c} <Trace> ParallelAggregatingBlockInputStream: Aggregating
2019.08.23 12:43:31.686828 [ 63 ] {e1f857f8-785f-4cbd-a536-32a2b799ff4c} <Trace> Aggregator: Aggregation method: key_string
2019.08.23 12:43:31.686847 [ 62 ] {e1f857f8-785f-4cbd-a536-32a2b799ff4c} <Trace> Aggregator: Aggregation method: key_string
2019.08.23 12:43:31.693262 [ 50 ] {e1f857f8-785f-4cbd-a536-32a2b799ff4c} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 157588 to 10357 rows (from 2.131 MiB) in 0.008 sec. (20135750.336 rows/sec., 272.275 MiB/sec.)
2019.08.23 12:43:31.693288 [ 50 ] {e1f857f8-785f-4cbd-a536-32a2b799ff4c} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 229175 to 14401 rows (from 3.102 MiB) in 0.008 sec. (29282753.656 rows/sec., 396.396 MiB/sec.)
2019.08.23 12:43:31.693296 [ 50 ] {e1f857f8-785f-4cbd-a536-32a2b799ff4c} <Trace> ParallelAggregatingBlockInputStream: Aggregated. 0 to 0 rows (from 0.000 MiB) in 0.008 sec. (0.000 rows/sec., 0.000 MiB/sec.)
2019.08.23 12:43:31.693304 [ 50 ] {e1f857f8-785f-4cbd-a536-32a2b799ff4c} <Trace> ParallelAggregatingBlockInputStream: Total aggregated. 386763 rows (from 5.233 MiB) in 0.008 sec. (49418503.991 rows/sec., 668.671 MiB/sec.)
2019.08.23 12:43:31.693309 [ 50 ] {e1f857f8-785f-4cbd-a536-32a2b799ff4c} <Trace> Aggregator: Merging aggregated data
...
2019.08.23 12:43:32.385399 [ 59 ] {e1f857f8-785f-4cbd-a536-32a2b799ff4c} <Trace> Aggregator: Merging partially aggregated blocks (bucket = -1).
2019.08.23 12:43:32.389026 [ 65 ] {} <Error> BaseDaemon: ########################################
2019.08.23 12:43:32.389063 [ 65 ] {} <Error> BaseDaemon: (version 19.13.3.26 (official build)) (from thread 59) Received signal Segmentation fault (11).
2019.08.23 12:43:32.389078 [ 65 ] {} <Error> BaseDaemon: Address: 0x26 Access: read. Address not mapped to object.
2019.08.23 12:43:32.417367 [ 65 ] {} <Error> BaseDaemon: 0. clickhouse-server(StackTrace::StackTrace(ucontext_t const&)+0x31) [0x7f6b631]
1. clickhouse-server() [0x3da132e]
2. /lib/x86_64-linux-gnu/libpthread.so.0(+0x110c0) [0x7f9e1a21d0c0]
3. clickhouse-server(CityHash_v1_0_2::CityHash64(char const*, unsigned long)+0x2a) [0x7f6deaa]
4. clickhouse-server(void DB::Aggregator::mergeStreamsImplCase<false, DB::AggregationMethodString<HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, StringRefHash64, HashTableNoState>, StringRefHash64, HashTableGrower<8ul>, AllocatorWithHint<true, AllocatorHints::DefaultHint, 67108864ul> > >, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, StringRefHash64, HashTableNoState>, StringRefHash64, HashTableGrower<8ul>, AllocatorWithHint<true, AllocatorHints::DefaultHint, 67108864ul> > >(DB::Block&, DB::Arena*, DB::AggregationMethodString<HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, StringRefHash64, HashTableNoState>, StringRefHash64, HashTableGrower<8ul>, AllocatorWithHint<true, AllocatorHints::DefaultHint, 67108864ul> > >&, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, StringRefHash64, HashTableNoState>, StringRefHash64, HashTableGrower<8ul>, AllocatorWithHint<true, AllocatorHints::DefaultHint, 67108864ul> >&, char*) const+0x20e) [0x7434b5e]
5. clickhouse-server(DB::Aggregator::mergeBlocks(std::__cxx11::list<DB::Block, std::allocator<DB::Block> >&, bool)+0x10eb) [0x73a6aeb]
6. clickhouse-server(DB::MergingAggregatedMemoryEfficientBlockInputStream::mergeThread(std::shared_ptr<DB::ThreadGroupStatus>)+0x24c) [0x734390c]
7. clickhouse-server() [0x734429d]
8. clickhouse-server(ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::_List_iterator<ThreadFromGlobalPool>)+0x1a7) [0x3c785e7]
9. clickhouse-server(ThreadFromGlobalPool::ThreadFromGlobalPool<ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}>(ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}&&)::{lambda()#1}::operator()() const+0x3e) [0x3c78bce]
10. clickhouse-server(ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>)+0x1a6) [0x3c760f6]
11. clickhouse-server() [0xba3e1a0]
12. /lib/x86_64-linux-gnu/libpthread.so.0(+0x7494) [0x7f9e1a213494]
13. /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f9e19a4dacf]

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 15 (13 by maintainers)

Commits related to this issue

Most upvoted comments

we should have a cluster with both localhost and remote nodes;

test_cluster_two_shards is enough

we should have LowCardinality on initiator and different String and LowCardinality(String) on remote nodes;

yes

two level group by should be activated;

not necessary

distributed_aggregation_memory_efficient should be set.

yes

Here is what I got:

create table data (key String) Engine=Memory();
create table dist (key LowCardinality(String)) engine=Distributed(test_cluster_two_shards, currentDatabase(), data);
insert into data values ('foo');
set distributed_aggregation_memory_efficient=1;
select * from dist group by key;
2020.05.30 05:58:23.367133 [ 36 ] {2446f1e1-bbce-484d-8b9a-68efb7d4ec6a} <Trace> Aggregator: Aggregating
2020.05.30 05:58:23.367455 [ 36 ] {2446f1e1-bbce-484d-8b9a-68efb7d4ec6a} <Trace> Aggregator: Aggregation method: key_string
2020.05.30 05:58:23.367654 [ 36 ] {2446f1e1-bbce-484d-8b9a-68efb7d4ec6a} <Trace> Aggregator: Aggregated. 1 to 1 rows (from 0.000 MiB) in 0.000 sec. (3957.669 rows/sec., 0.045 MiB/sec.)
2020.05.30 05:58:23.367914 [ 36 ] {2446f1e1-bbce-484d-8b9a-68efb7d4ec6a} <Trace> Aggregator: Merging aggregated data
2020.05.30 05:58:23.368707 [ 28 ] {2446f1e1-bbce-484d-8b9a-68efb7d4ec6a} <Information> executeQuery: Read 1 rows, 12.00 B in 0.003 sec., 325 rows/sec., 3.81 KiB/sec.
2020.05.30 05:58:23.368748 [ 32 ] {7d5ca935-0738-470c-a4a0-edd0b0114af7} <Trace> Aggregator: Merging partially aggregated blocks (bucket = -1).
2020.05.30 05:58:23.369045 [ 28 ] {2446f1e1-bbce-484d-8b9a-68efb7d4ec6a} <Debug> MemoryTracker: Peak memory usage (for query): 5.55 KiB.
2020.05.30 05:58:23.369600 [ 28 ] {2446f1e1-bbce-484d-8b9a-68efb7d4ec6a} <Trace> virtual DB::MergingAndConvertingBlockInputStream::~MergingAndConvertingBlockInputStream(): Waiting for threads to finish
2020.05.30 05:58:23.369831 [ 28 ] {2446f1e1-bbce-484d-8b9a-68efb7d4ec6a} <Information> TCPHandler: Processed in 0.005 sec.
2020.05.30 05:58:23.370575 [ 37 ] {} <Fatal> BaseDaemon: ########################################
2020.05.30 05:58:23.370918 [ 37 ] {} <Fatal> BaseDaemon: (version 19.14.9.12 (official build)) (from thread 32) Received signal Segmentation fault (11).
2020.05.30 05:58:23.371139 [ 37 ] {} <Fatal> BaseDaemon: Address: 0x1 Access: read. Address not mapped to object.
2020.05.30 05:58:23.371417 [ 37 ] {} <Fatal> BaseDaemon: Stack trace: 0x55555e8148ca 0x55555cd4a15e 0x55555ccbbbd3 0x55555cc56cac 0x55555cc5763d 0x55555940d0ce 0x55555940d6de 0x55555940ab7c 0x55555f078d20 0x7ffff7bbd6db 0x7ffff74da88f
2020.05.30 05:58:23.406432 [ 37 ] {} <Fatal> BaseDaemon: 3. 0x55555e8148ca CityHash_v1_0_2::CityHash64(char const*, unsigned long) /usr/bin/clickhouse
2020.05.30 05:58:23.406997 [ 37 ] {} <Fatal> BaseDaemon: 4. 0x55555cd4a15e void DB::Aggregator::mergeStreamsImplCase<false, DB::AggregationMethodString<HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, StringRefHash64, HashTableNoState>, StringRefHash64, HashTableGrower<8ul>, Allocator<true, true> > >, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, StringRefHash64, HashTableNoState>, StringRefHash64, HashTableGrower<8ul>, Allocator<true, true> > >(DB::Block&, DB::Arena*, DB::AggregationMethodString<HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, StringRefHash64, HashTableNoState>, StringRefHash64, HashTableGrower<8ul>, Allocator<true, true> > >&, HashMapTable<StringRef, HashMapCellWithSavedHash<StringRef, char*, StringRefHash64, HashTableNoState>, StringRefHash64, HashTableGrower<8ul>, Allocator<true, true> >&, char*) const /usr/bin/clickhouse
2020.05.30 05:58:23.407371 [ 37 ] {} <Fatal> BaseDaemon: 5. 0x55555ccbbbd3 DB::Aggregator::mergeBlocks(std::__cxx11::list<DB::Block, std::allocator<DB::Block> >&, bool) /usr/bin/clickhouse
2020.05.30 05:58:23.407495 [ 37 ] {} <Fatal> BaseDaemon: 6. 0x55555cc56cac DB::MergingAggregatedMemoryEfficientBlockInputStream::mergeThread(std::shared_ptr<DB::ThreadGroupStatus>) /usr/bin/clickhouse
2020.05.30 05:58:23.407529 [ 37 ] {} <Fatal> BaseDaemon: 7. 0x55555cc5763d ? /usr/bin/clickhouse
2020.05.30 05:58:23.407565 [ 37 ] {} <Fatal> BaseDaemon: 8. 0x55555940d0ce ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::_List_iterator<ThreadFromGlobalPool>) /usr/bin/clickhouse
2020.05.30 05:58:23.407629 [ 37 ] {} <Fatal> BaseDaemon: 9. 0x55555940d6de ThreadFromGlobalPool::ThreadFromGlobalPool<ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}>(ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::function<void ()>, int, std::optional<unsigned long>)::{lambda()#3}&&)::{lambda()#1}::operator()() const /usr/bin/clickhouse
2020.05.30 05:58:23.407674 [ 37 ] {} <Fatal> BaseDaemon: 10. 0x55555940ab7c ThreadPoolImpl<std::thread>::worker(std::_List_iterator<std::thread>) /usr/bin/clickhouse
2020.05.30 05:58:23.407716 [ 37 ] {} <Fatal> BaseDaemon: 11. 0x55555f078d20 ? /usr/bin/clickhouse
2020.05.30 05:58:23.407765 [ 37 ] {} <Fatal> BaseDaemon: 12. 0x7ffff7bbd6db start_thread /lib/x86_64-linux-gnu/libpthread-2.27.so
2020.05.30 05:58:23.407806 [ 37 ] {} <Fatal> BaseDaemon: 13. 0x7ffff74da88f __clone /lib/x86_64-linux-gnu/libc-2.27.so
2020.05.30 05:58:23.425761 [ 4 ] {} <Trace> SystemLog (system.trace_log): Flushing system log
2020.05.30 05:58:23.425912 [ 4 ] {} <Debug> SystemLog (system.trace_log): Creating new table system.trace_log for TraceLog
2020.05.30 05:58:23.426733 [ 4 ] {} <Information> BackgroundProcessingPool: Create BackgroundProcessingPool with 16 threads
2020.05.30 05:58:23.427539 [ 4 ] {} <Debug> system.trace_log: Loading data parts
2020.05.30 05:58:23.427753 [ 4 ] {} <Debug> system.trace_log: Loaded data parts (0 items)
2020.05.30 05:58:23.441144 [ 4 ] {} <Trace> system.trace_log: Renaming temporary part tmp_insert_202005_1_1_0 to 202005_1_1_0.

I think that there may be incorrect types for LowCardinality, i.e. missing getDictionaryType().get(), but need to verify, I looked only briefly for now