ClickHouse: LXC: Segmentation fault, sporadically
Sometimes, once in 2-3 hours, Clickhouse restarts with segmentation fault. Database has almost no load, approx. 1000 inserts per minute.
Does it reproduce on recent release? Yes
How to reproduce I don’t know, it happens sporadically
-
Which ClickHouse server version to use 21.2.5.5
-
Which interface to use, if matters ClickHouse Python Driver (clickhouse-driver 0.2.0)
Expected behavior No segfaults
Error message and/or stacktrace
2021.03.10 13:36:18.523605 [ 19126 ] {} <Fatal> BaseDaemon: ########################################
2021.03.10 13:36:18.523690 [ 19126 ] {} <Fatal> BaseDaemon: (version 21.2.5.5 (official build), build id: BFF9D4E9C079763C6DFA1C04E8A4D1903D0FFE69) (from thread 19020) (no query) Received signal Segmentation fault (11)
2021.03.10 13:36:18.523713 [ 19126 ] {} <Fatal> BaseDaemon: Address: NULL pointer. Access: read. Unknown si_code.
2021.03.10 13:36:18.523732 [ 19126 ] {} <Fatal> BaseDaemon: Stack trace: 0xe85d688 0xe859455 0xe85c5a8 0xe85d152 0x84f700f 0x84faaa3 0x7fac958dd4a4 0x7fac9561fd0f
2021.03.10 13:36:18.523811 [ 19126 ] {} <Fatal> BaseDaemon: 2. std::__1::__tree<std::__1::__value_type<Poco::Timestamp, std::__1::shared_ptr<DB::BackgroundSchedulePoolTaskInfo> >, std::__1::__map_value_compare<Poco::Timestamp, std::__1::__value_type<Poco::Timestamp, std::__1::shared_ptr<DB::BackgroundSchedulePoolTaskInfo> >, std::__1::less<Poco::Timestamp>, true>, std::__1::allocator<std::__1::__value_type<Poco::Timestamp, std::__1::shared_ptr<DB::BackgroundSchedulePoolTaskInfo> > > >::erase(std::__1::__tree_const_iterator<std::__1::__value_type<Poco::Timestamp, std::__1::shared_ptr<DB::BackgroundSchedulePoolTaskInfo> >, std::__1::__tree_node<std::__1::__value_type<Poco::Timestamp, std::__1::shared_ptr<DB::BackgroundSchedulePoolTaskInfo> >, void*>*, long>) @ 0xe85d688 in /usr/bin/clickhouse
2021.03.10 13:36:18.523853 [ 19126 ] {} <Fatal> BaseDaemon: 3. DB::BackgroundSchedulePoolTaskInfo::scheduleImpl(std::__1::lock_guard<std::__1::mutex>&) @ 0xe859455 in /usr/bin/clickhouse
2021.03.10 13:36:18.523871 [ 19126 ] {} <Fatal> BaseDaemon: 4. DB::BackgroundSchedulePool::delayExecutionThreadFunction() @ 0xe85c5a8 in /usr/bin/clickhouse
2021.03.10 13:36:18.523886 [ 19126 ] {} <Fatal> BaseDaemon: 5. ? @ 0xe85d152 in /usr/bin/clickhouse
2021.03.10 13:36:18.523904 [ 19126 ] {} <Fatal> BaseDaemon: 6. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x84f700f in /usr/bin/clickhouse
2021.03.10 13:36:18.523919 [ 19126 ] {} <Fatal> BaseDaemon: 7. ? @ 0x84faaa3 in /usr/bin/clickhouse
2021.03.10 13:36:18.523942 [ 19126 ] {} <Fatal> BaseDaemon: 8. start_thread @ 0x74a4 in /lib/x86_64-linux-gnu/libpthread-2.24.so
2021.03.10 13:36:18.523963 [ 19126 ] {} <Fatal> BaseDaemon: 9. clone @ 0xe8d0f in /lib/x86_64-linux-gnu/libc-2.24.so
2021.03.10 13:36:18.615505 [ 19126 ] {} <Fatal> BaseDaemon: Checksum of the binary: 94B88F83D23FC43FAEFCE896EDC6A812, integrity check passed.
2021.03.10 13:36:28.554129 [ 18967 ] {} <Fatal> Application: Child process was terminated by signal 11.
2021.03.10 13:36:58.997144 [ 19522 ] {} <Error> bool DB::(anonymous namespace)::checkPermissionsImpl(): Code: 412, e.displayText() = DB::Exception: Can't receive Netlink response: error -2, Stack trace (when copying this message, always include the lines below):
0. ? @ 0x84eadb2 in /usr/bin/clickhouse
1. ? @ 0x84eaf08 in /usr/bin/clickhouse
2. DB::TaskStatsInfoGetter::TaskStatsInfoGetter() @ 0x84ea837 in /usr/bin/clickhouse
3. ? @ 0x84ea686 in /usr/bin/clickhouse
4. DB::TaskStatsInfoGetter::checkPermissions() @ 0x84ea626 in /usr/bin/clickhouse
5. DB::TasksStatsCounters::create(unsigned long) @ 0x84e2e00 in /usr/bin/clickhouse
6. DB::ThreadStatus::initPerformanceCounters() @ 0xee5a96b in /usr/bin/clickhouse
7. DB::ThreadStatus::setupState(std::__1::shared_ptr<DB::ThreadGroupStatus> const&) @ 0xee5a648 in /usr/bin/clickhouse
8. DB::CurrentThread::initializeQuery() @ 0xee5c602 in /usr/bin/clickhouse
9. DB::BackgroundSchedulePool::attachToThreadGroup() @ 0xe85bfcd in /usr/bin/clickhouse
10. DB::BackgroundSchedulePool::threadFunction() @ 0xe85c0b2 in /usr/bin/clickhouse
11. ? @ 0xe85ceb2 in /usr/bin/clickhouse
12. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x84f700f in /usr/bin/clickhouse
13. ? @ 0x84faaa3 in /usr/bin/clickhouse
14. start_thread @ 0x74a4 in /lib/x86_64-linux-gnu/libpthread-2.24.so
15. clone @ 0xe8d0f in /lib/x86_64-linux-gnu/libc-2.24.so
(version 21.2.5.5 (official build))
Additional context CH runs in LXC container on host Proxmox VE 6.3-3 with 512GB RAM. Container has limit 128GB RAM. Here are some settings from the config file:
<max_server_memory_usage>125000000000</max_server_memory_usage>
<max_server_memory_usage_to_ram_ratio>0.9</max_server_memory_usage_to_ram_ratio>
<total_memory_profiler_step>4194304</total_memory_profiler_step>
Here is output of “free” command:
# free
total used free shared buff/cache available
Mem: 134217728 396256 132932412 9376 889060 133821472
Swap: 8388604 0 8388604
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (6 by maintainers)
Fixed in #32916.