ClickHouse: Hedged requests hang in rare cases
Found hung queries in processlist:
Row 1:
──────
is_initial_query: 1
user: default
query_id: 76d2f16e-5dc9-42cb-895e-2dc8edbe1c08
address: ::1
port: 36112
initial_user: default
initial_query_id: 76d2f16e-5dc9-42cb-895e-2dc8edbe1c08
initial_address: ::1
initial_port: 36112
interface: 1
os_user:
client_hostname: 27c20d53754c
client_name: ClickHouse
client_revision: 54449
client_version_major: 21
client_version_minor: 11
client_version_patch: 1
http_method: 0
http_user_agent:
http_referer:
forwarded_for:
quota_key:
elapsed: 3884.86343185
is_cancelled: 0
read_rows: 0
read_bytes: 0
total_rows_approx: 0
written_rows: 0
written_bytes: 0
memory_usage: 158327769
peak_memory_usage: 157278873
query: SELECT sum(UserID GLOBAL IN (SELECT UserID FROM remote('127.0.0.{1,2}', test.hits))) FROM remote('127.0.0.{1,2}', test.hits);
thread_ids: [1045,3287,3074,3061,3126,13797,13985,15448,3279,15447,14006,8032,2684,2695,3259,15450,3121,13791]
ProfileEvents: {'Query':1,'SelectQuery':1,'FileOpen':1,'Seek':15,'ReadBufferFromFileDescriptorRead':17,'ReadBufferFromFileDescriptorReadBytes':16375089,'ReadCompressedBytes':21697578,'CompressedReadBufferBlocks':1226,'CompressedReadBufferBytes':135426010,'IOBufferAllocs':36,'IOBufferAllocBytes':20571045,'TableFunctionExecute':2,'MarkCacheHits':16,'CreatedReadBufferOrdinary':16,'DiskReadElapsedMicroseconds':128000,'NetworkReceiveElapsedMicroseconds':9956,'NetworkSendElapsedMicroseconds':148,'NetworkReceiveBytes':10573432,'NetworkSendBytes':1064,'HedgedRequestsChangeReplica':1,'SelectedParts':1,'SelectedRanges':1,'SelectedMarks':1084,'SelectedRows':25047187,'SelectedBytes':200377496,'ContextLock':334,'RWLockAcquiredReadLocks':4,'RWLockReadersWaitMilliseconds':4,'CannotWriteToWriteBufferDiscard':36408,'QueryProfilerSignalOverruns':2920}
Settings: {'connect_timeout_with_failover_ms':'2000','connect_timeout_with_failover_secure_ms':'3000','idle_connection_timeout':'36000','replication_wait_for_inactive_replica_timeout':'30','load_balancing':'random','log_queries':'1','insert_quorum_timeout':'60000','http_send_timeout':'60','http_receive_timeout':'60','opentelemetry_start_trace_probability':'0.1','max_memory_usage':'10000000000','max_untracked_memory':'1048576','memory_profiler_step':'1048576','log_comment':'00147_global_in_aggregate_function.sql','send_logs_level':'warning','database_atomic_wait_for_drop_and_detach_synchronously':'1','allow_experimental_database_replicated':'1','async_insert_busy_timeout_ms':'5000'}
current_database: test_0s7gn7
Thread 247 (Thread 0x7fc30c2af700 (LWP 1045)):
#0 __syscall () at ../base/glibc-compatibility/musl/x86_64/syscall.s:14
#1 0x0000000026a70a44 in epoll_pwait (fd=1833, ev=0x7fc30c29c920, cnt=1, to=-1, sigs=0x0) at ../base/glibc-compatibility/musl/epoll.c:27
#2 0x0000000026a70ace in epoll_wait (fd=1833, ev=0x7fc30c29c920, cnt=1, to=-1) at ../base/glibc-compatibility/musl/epoll.c:36
#3 0x000000001dd1cca4 in DB::Epoll::getManyReady (this=0x7fc30c29cf18, max_events=1, events_out=0x7fc30c29c920, blocking=true) at ../src/Common/Epoll.cpp:69
#4 0x000000001f91784c in DB::PollingQueue::wait (this=0x7fc30c29cf18, lock=...) at ../src/Processors/Executors/PollingQueue.cpp:73
#5 0x000000001f8fdbdd in DB::PipelineExecutor::executeImpl (this=0x7fc30c29cec0, num_threads=17) at ../src/Processors/Executors/PipelineExecutor.cpp:805
#6 0x000000001f8fd113 in DB::PipelineExecutor::execute (this=0x7fc30c29cec0, num_threads=17) at ../src/Processors/Executors/PipelineExecutor.cpp:407
#7 0x000000001f8f862e in DB::CompletedPipelineExecutor::execute (this=0x7fc30c29d238) at ../src/Processors/Executors/CompletedPipelineExecutor.cpp:99
#8 0x000000001e2674f2 in DB::GlobalSubqueriesMatcher::Data::addExternalStorage (this=0x7fc30c29ddb0, ast=..., set_alias=false) at ../src/Interpreters/GlobalSubqueriesVisitor.h:164
#9 0x000000001e266592 in DB::GlobalSubqueriesMatcher::visit (func=..., data=...) at ../src/Interpreters/GlobalSubqueriesVisitor.h:220
#10 0x000000001e266201 in DB::GlobalSubqueriesMatcher::visit (ast=..., data=...) at ../src/Interpreters/GlobalSubqueriesVisitor.h:184
#11 0x000000001e256ec1 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:34
#12 0x000000001e266197 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visitChildren (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:62
#13 0x000000001e256ea9 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:30
#14 0x000000001e266197 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visitChildren (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:62
#15 0x000000001e256ea9 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:30
#16 0x000000001e266197 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visitChildren (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:62
#17 0x000000001e256ea9 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:30
#18 0x000000001e266197 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visitChildren (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:62
#19 0x000000001e256ea9 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:30
#20 0x000000001e245cb4 in DB::ExpressionAnalyzer::initGlobalSubqueriesAndExternalTables (this=0x7fc387c61c00, do_global=true) at ../src/Interpreters/ExpressionAnalyzer.cpp:360
#21 0x000000001e2459a5 in DB::ExpressionAnalyzer::ExpressionAnalyzer (this=0x7fc387c61c00, query_=..., syntax_analyzer_result_=..., context_=..., subquery_depth_=0, do_global=true, subqueries_for_sets_=..., prepared_sets_=...) at ../src/Interpreters/ExpressionAnalyzer.cpp:153
#22 0x000000001e6581f7 in DB::SelectQueryExpressionAnalyzer::SelectQueryExpressionAnalyzer (this=0x7fc387c61c00, query_=..., syntax_analyzer_result_=..., context_=..., metadata_snapshot_=..., required_result_columns_=..., do_global_=true, options_=..., subqueries_for_sets_=..., prepared_sets_=...) at ../src/Interpreters/ExpressionAnalyzer.h:303
#23 0x000000001e6601c1 in std::__1::make_unique<DB::SelectQueryExpressionAnalyzer, std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::TreeRewriterResult const>&, std::__1::shared_ptr<DB::Context>&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, std::__1::unordered_set<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, bool, DB::SelectQueryOptions&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, DB::SubqueryForSet, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, DB::SubqueryForSet> > >, std::__1::unordered_map<DB::PreparedSetKey, std::__1::shared_ptr<DB::Set>, DB::PreparedSetKey::Hash, std::__1::equal_to<DB::PreparedSetKey>, std::__1::allocator<std::__1::pair<DB::PreparedSetKey const, std::__1::shared_ptr<DB::Set> > > > > (__args=..., __args=..., __args=..., __args=..., __args=..., __args=..., __args=..., __args=..., __args=...) at ../contrib/libcxx/include/memory:2068
#24 0x000000001e6418fb in DB::InterpreterSelectQuery::InterpreterSelectQuery(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>, std::__1::shared_ptr<DB::IStorage> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&)::$_1::operator()(bool) const (this=0x7fc30c2a0ad8, try_move_to_prewhere=true) at ../src/Interpreters/InterpreterSelectQuery.cpp:439
#25 0x000000001e63ee0d in DB::InterpreterSelectQuery::InterpreterSelectQuery (this=0x7fc380aa2800, query_ptr_=..., context_=..., input_=..., input_pipe_=..., storage_=..., options_=..., required_result_column_names=..., metadata_snapshot_=...) at ../src/Interpreters/InterpreterSelectQuery.cpp:513
#26 0x000000001e63da7f in DB::InterpreterSelectQuery::InterpreterSelectQuery (this=0x7fc380aa2800, query_ptr_=..., context_=..., options_=..., required_result_column_names_=...) at ../src/Interpreters/InterpreterSelectQuery.cpp:160
#27 0x000000001e9a6432 in std::__1::make_unique<DB::InterpreterSelectQuery, std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context>&, DB::SelectQueryOptions&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&> (__args=..., __args=..., __args=..., __args=...) at ../contrib/libcxx/include/memory:2068
#28 0x000000001e9a47e9 in DB::InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter (this=0x7fc330947820, ast_ptr_=..., current_required_result_column_names=...) at ../src/Interpreters/InterpreterSelectWithUnionQuery.cpp:216
#29 0x000000001e9a403d in DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery (this=0x7fc330947820, query_ptr_=..., context_=..., options_=..., required_result_column_names=...) at ../src/Interpreters/InterpreterSelectWithUnionQuery.cpp:138
#30 0x000000001e5a7c7d in std::__1::make_unique<DB::InterpreterSelectWithUnionQuery, std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>&, DB::SelectQueryOptions const&> (__args=..., __args=..., __args=...) at ../contrib/libcxx/include/memory:2068
#31 0x000000001e5a6235 in DB::InterpreterFactory::get (query=..., context=..., options=...) at ../src/Interpreters/InterpreterFactory.cpp:118
#32 0x000000001ec60ae0 in DB::executeQueryImpl (begin=0x7fc3a50c1300 "SELECT sum(UserID GLOBAL IN (SELECT UserID FROM remote('127.0.0.{1,2}', test.hits))) FROM remote('127.0.0.{1,2}', test.hits);", end=0x7fc3a50c137d "", context=..., internal=false, stage=DB::QueryProcessingStage::Complete, istr=0x0) at ../src/Interpreters/executeQuery.cpp:605
#33 0x000000001ec5ea64 in DB::executeQuery (query=..., context=..., internal=false, stage=DB::QueryProcessingStage::Complete) at ../src/Interpreters/executeQuery.cpp:950
#34 0x000000001f835ca6 in DB::TCPHandler::runImpl (this=0x7fc3a3797000) at ../src/Server/TCPHandler.cpp:292
#35 0x000000001f842f65 in DB::TCPHandler::run (this=0x7fc3a3797000) at ../src/Server/TCPHandler.cpp:1628
#36 0x0000000023c86b59 in Poco::Net::TCPServerConnection::start (this=0x7fc3a3797000) at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
#37 0x0000000023c87368 in Poco::Net::TCPServerDispatcher::run (this=0x7fc3a4ebb700) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:115
#38 0x0000000023dd5294 in Poco::PooledThread::run (this=0x7fc3a2706180) at ../contrib/poco/Foundation/src/ThreadPool.cpp:199
#39 0x0000000023dd1d7a in Poco::(anonymous namespace)::RunnableHolder::run (this=0x7fc3a26233f0) at ../contrib/poco/Foundation/src/Thread.cpp:55
#40 0x0000000023dd0b5c in Poco::ThreadImpl::runnableEntry (pThread=0x7fc3a27061b8) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
#41 0x00007fc46877e609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#42 0x00007fc468694293 in clone () from /lib/x86_64-linux-gnu/libc.so.6
Label is ‘bug’, because similar issues were seen in Yandex.Metrica
cc: @Avogar
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 24 (21 by maintainers)
Commits related to this issue
- Bump libunwind to fix possible issues Refs: https://github.com/ClickHouse-Extras/libunwind/pull/15 Refs: #29618 Refs: #31737 — committed to azat/ClickHouse by azat 3 years ago
It’s because Query Profiler sends
SIGUSR1
every second, see settingquery_profiler_real_time_period_ns
andQueryProfilerReal
.