ClickHouse: Hedged requests hang in rare cases

https://clickhouse-test-reports.s3.yandex.net/0/2517f70d7532fc6aca4428f8de54aeae8ab1bf7d/stress_test_(debug).html#fail1

Found hung queries in processlist:
Row 1:
──────
is_initial_query:     1
user:                 default
query_id:             76d2f16e-5dc9-42cb-895e-2dc8edbe1c08
address:              ::1
port:                 36112
initial_user:         default
initial_query_id:     76d2f16e-5dc9-42cb-895e-2dc8edbe1c08
initial_address:      ::1
initial_port:         36112
interface:            1
os_user:              
client_hostname:      27c20d53754c
client_name:          ClickHouse 
client_revision:      54449
client_version_major: 21
client_version_minor: 11
client_version_patch: 1
http_method:          0
http_user_agent:      
http_referer:         
forwarded_for:        
quota_key:            
elapsed:              3884.86343185
is_cancelled:         0
read_rows:            0
read_bytes:           0
total_rows_approx:    0
written_rows:         0
written_bytes:        0
memory_usage:         158327769
peak_memory_usage:    157278873
query:                SELECT sum(UserID GLOBAL IN (SELECT UserID FROM remote('127.0.0.{1,2}', test.hits))) FROM remote('127.0.0.{1,2}', test.hits);
thread_ids:           [1045,3287,3074,3061,3126,13797,13985,15448,3279,15447,14006,8032,2684,2695,3259,15450,3121,13791]
ProfileEvents:        {'Query':1,'SelectQuery':1,'FileOpen':1,'Seek':15,'ReadBufferFromFileDescriptorRead':17,'ReadBufferFromFileDescriptorReadBytes':16375089,'ReadCompressedBytes':21697578,'CompressedReadBufferBlocks':1226,'CompressedReadBufferBytes':135426010,'IOBufferAllocs':36,'IOBufferAllocBytes':20571045,'TableFunctionExecute':2,'MarkCacheHits':16,'CreatedReadBufferOrdinary':16,'DiskReadElapsedMicroseconds':128000,'NetworkReceiveElapsedMicroseconds':9956,'NetworkSendElapsedMicroseconds':148,'NetworkReceiveBytes':10573432,'NetworkSendBytes':1064,'HedgedRequestsChangeReplica':1,'SelectedParts':1,'SelectedRanges':1,'SelectedMarks':1084,'SelectedRows':25047187,'SelectedBytes':200377496,'ContextLock':334,'RWLockAcquiredReadLocks':4,'RWLockReadersWaitMilliseconds':4,'CannotWriteToWriteBufferDiscard':36408,'QueryProfilerSignalOverruns':2920}
Settings:             {'connect_timeout_with_failover_ms':'2000','connect_timeout_with_failover_secure_ms':'3000','idle_connection_timeout':'36000','replication_wait_for_inactive_replica_timeout':'30','load_balancing':'random','log_queries':'1','insert_quorum_timeout':'60000','http_send_timeout':'60','http_receive_timeout':'60','opentelemetry_start_trace_probability':'0.1','max_memory_usage':'10000000000','max_untracked_memory':'1048576','memory_profiler_step':'1048576','log_comment':'00147_global_in_aggregate_function.sql','send_logs_level':'warning','database_atomic_wait_for_drop_and_detach_synchronously':'1','allow_experimental_database_replicated':'1','async_insert_busy_timeout_ms':'5000'}
current_database:     test_0s7gn7


Thread 247 (Thread 0x7fc30c2af700 (LWP 1045)):
#0  __syscall () at ../base/glibc-compatibility/musl/x86_64/syscall.s:14
#1  0x0000000026a70a44 in epoll_pwait (fd=1833, ev=0x7fc30c29c920, cnt=1, to=-1, sigs=0x0) at ../base/glibc-compatibility/musl/epoll.c:27
#2  0x0000000026a70ace in epoll_wait (fd=1833, ev=0x7fc30c29c920, cnt=1, to=-1) at ../base/glibc-compatibility/musl/epoll.c:36
#3  0x000000001dd1cca4 in DB::Epoll::getManyReady (this=0x7fc30c29cf18, max_events=1, events_out=0x7fc30c29c920, blocking=true) at ../src/Common/Epoll.cpp:69
#4  0x000000001f91784c in DB::PollingQueue::wait (this=0x7fc30c29cf18, lock=...) at ../src/Processors/Executors/PollingQueue.cpp:73
#5  0x000000001f8fdbdd in DB::PipelineExecutor::executeImpl (this=0x7fc30c29cec0, num_threads=17) at ../src/Processors/Executors/PipelineExecutor.cpp:805
#6  0x000000001f8fd113 in DB::PipelineExecutor::execute (this=0x7fc30c29cec0, num_threads=17) at ../src/Processors/Executors/PipelineExecutor.cpp:407
#7  0x000000001f8f862e in DB::CompletedPipelineExecutor::execute (this=0x7fc30c29d238) at ../src/Processors/Executors/CompletedPipelineExecutor.cpp:99
#8  0x000000001e2674f2 in DB::GlobalSubqueriesMatcher::Data::addExternalStorage (this=0x7fc30c29ddb0, ast=..., set_alias=false) at ../src/Interpreters/GlobalSubqueriesVisitor.h:164
#9  0x000000001e266592 in DB::GlobalSubqueriesMatcher::visit (func=..., data=...) at ../src/Interpreters/GlobalSubqueriesVisitor.h:220
#10 0x000000001e266201 in DB::GlobalSubqueriesMatcher::visit (ast=..., data=...) at ../src/Interpreters/GlobalSubqueriesVisitor.h:184
#11 0x000000001e256ec1 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:34
#12 0x000000001e266197 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visitChildren (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:62
#13 0x000000001e256ea9 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:30
#14 0x000000001e266197 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visitChildren (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:62
#15 0x000000001e256ea9 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:30
#16 0x000000001e266197 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visitChildren (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:62
#17 0x000000001e256ea9 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:30
#18 0x000000001e266197 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visitChildren (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:62
#19 0x000000001e256ea9 in DB::InDepthNodeVisitor<DB::GlobalSubqueriesMatcher, false, false, std::__1::shared_ptr<DB::IAST> >::visit (this=0x7fc30c29dd78, ast=...) at ../src/Interpreters/InDepthNodeVisitor.h:30
#20 0x000000001e245cb4 in DB::ExpressionAnalyzer::initGlobalSubqueriesAndExternalTables (this=0x7fc387c61c00, do_global=true) at ../src/Interpreters/ExpressionAnalyzer.cpp:360
#21 0x000000001e2459a5 in DB::ExpressionAnalyzer::ExpressionAnalyzer (this=0x7fc387c61c00, query_=..., syntax_analyzer_result_=..., context_=..., subquery_depth_=0, do_global=true, subqueries_for_sets_=..., prepared_sets_=...) at ../src/Interpreters/ExpressionAnalyzer.cpp:153
#22 0x000000001e6581f7 in DB::SelectQueryExpressionAnalyzer::SelectQueryExpressionAnalyzer (this=0x7fc387c61c00, query_=..., syntax_analyzer_result_=..., context_=..., metadata_snapshot_=..., required_result_columns_=..., do_global_=true, options_=..., subqueries_for_sets_=..., prepared_sets_=...) at ../src/Interpreters/ExpressionAnalyzer.h:303
#23 0x000000001e6601c1 in std::__1::make_unique<DB::SelectQueryExpressionAnalyzer, std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::TreeRewriterResult const>&, std::__1::shared_ptr<DB::Context>&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const>&, std::__1::unordered_set<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > >, bool, DB::SelectQueryOptions&, std::__1::unordered_map<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, DB::SubqueryForSet, std::__1::hash<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::equal_to<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > >, std::__1::allocator<std::__1::pair<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const, DB::SubqueryForSet> > >, std::__1::unordered_map<DB::PreparedSetKey, std::__1::shared_ptr<DB::Set>, DB::PreparedSetKey::Hash, std::__1::equal_to<DB::PreparedSetKey>, std::__1::allocator<std::__1::pair<DB::PreparedSetKey const, std::__1::shared_ptr<DB::Set> > > > > (__args=..., __args=..., __args=..., __args=..., __args=..., __args=..., __args=..., __args=..., __args=...) at ../contrib/libcxx/include/memory:2068
#24 0x000000001e6418fb in DB::InterpreterSelectQuery::InterpreterSelectQuery(std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context const>, std::__1::shared_ptr<DB::IBlockInputStream> const&, std::__1::optional<DB::Pipe>, std::__1::shared_ptr<DB::IStorage> const&, DB::SelectQueryOptions const&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&, std::__1::shared_ptr<DB::StorageInMemoryMetadata const> const&)::$_1::operator()(bool) const (this=0x7fc30c2a0ad8, try_move_to_prewhere=true) at ../src/Interpreters/InterpreterSelectQuery.cpp:439
#25 0x000000001e63ee0d in DB::InterpreterSelectQuery::InterpreterSelectQuery (this=0x7fc380aa2800, query_ptr_=..., context_=..., input_=..., input_pipe_=..., storage_=..., options_=..., required_result_column_names=..., metadata_snapshot_=...) at ../src/Interpreters/InterpreterSelectQuery.cpp:513
#26 0x000000001e63da7f in DB::InterpreterSelectQuery::InterpreterSelectQuery (this=0x7fc380aa2800, query_ptr_=..., context_=..., options_=..., required_result_column_names_=...) at ../src/Interpreters/InterpreterSelectQuery.cpp:160
#27 0x000000001e9a6432 in std::__1::make_unique<DB::InterpreterSelectQuery, std::__1::shared_ptr<DB::IAST> const&, std::__1::shared_ptr<DB::Context>&, DB::SelectQueryOptions&, std::__1::vector<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::allocator<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > > > const&> (__args=..., __args=..., __args=..., __args=...) at ../contrib/libcxx/include/memory:2068
#28 0x000000001e9a47e9 in DB::InterpreterSelectWithUnionQuery::buildCurrentChildInterpreter (this=0x7fc330947820, ast_ptr_=..., current_required_result_column_names=...) at ../src/Interpreters/InterpreterSelectWithUnionQuery.cpp:216
#29 0x000000001e9a403d in DB::InterpreterSelectWithUnionQuery::InterpreterSelectWithUnionQuery (this=0x7fc330947820, query_ptr_=..., context_=..., options_=..., required_result_column_names=...) at ../src/Interpreters/InterpreterSelectWithUnionQuery.cpp:138
#30 0x000000001e5a7c7d in std::__1::make_unique<DB::InterpreterSelectWithUnionQuery, std::__1::shared_ptr<DB::IAST>&, std::__1::shared_ptr<DB::Context>&, DB::SelectQueryOptions const&> (__args=..., __args=..., __args=...) at ../contrib/libcxx/include/memory:2068
#31 0x000000001e5a6235 in DB::InterpreterFactory::get (query=..., context=..., options=...) at ../src/Interpreters/InterpreterFactory.cpp:118
#32 0x000000001ec60ae0 in DB::executeQueryImpl (begin=0x7fc3a50c1300 "SELECT sum(UserID GLOBAL IN (SELECT UserID FROM remote('127.0.0.{1,2}', test.hits))) FROM remote('127.0.0.{1,2}', test.hits);", end=0x7fc3a50c137d "", context=..., internal=false, stage=DB::QueryProcessingStage::Complete, istr=0x0) at ../src/Interpreters/executeQuery.cpp:605
#33 0x000000001ec5ea64 in DB::executeQuery (query=..., context=..., internal=false, stage=DB::QueryProcessingStage::Complete) at ../src/Interpreters/executeQuery.cpp:950
#34 0x000000001f835ca6 in DB::TCPHandler::runImpl (this=0x7fc3a3797000) at ../src/Server/TCPHandler.cpp:292
#35 0x000000001f842f65 in DB::TCPHandler::run (this=0x7fc3a3797000) at ../src/Server/TCPHandler.cpp:1628
#36 0x0000000023c86b59 in Poco::Net::TCPServerConnection::start (this=0x7fc3a3797000) at ../contrib/poco/Net/src/TCPServerConnection.cpp:43
#37 0x0000000023c87368 in Poco::Net::TCPServerDispatcher::run (this=0x7fc3a4ebb700) at ../contrib/poco/Net/src/TCPServerDispatcher.cpp:115
#38 0x0000000023dd5294 in Poco::PooledThread::run (this=0x7fc3a2706180) at ../contrib/poco/Foundation/src/ThreadPool.cpp:199
#39 0x0000000023dd1d7a in Poco::(anonymous namespace)::RunnableHolder::run (this=0x7fc3a26233f0) at ../contrib/poco/Foundation/src/Thread.cpp:55
#40 0x0000000023dd0b5c in Poco::ThreadImpl::runnableEntry (pThread=0x7fc3a27061b8) at ../contrib/poco/Foundation/src/Thread_POSIX.cpp:345
#41 0x00007fc46877e609 in start_thread () from /lib/x86_64-linux-gnu/libpthread.so.0
#42 0x00007fc468694293 in clone () from /lib/x86_64-linux-gnu/libc.so.6

Label is ‘bug’, because similar issues were seen in Yandex.Metrica

cc: @Avogar

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 24 (21 by maintainers)

Commits related to this issue

Most upvoted comments

we get EINTR constantly almost every second for some reason

It’s because Query Profiler sends SIGUSR1 every second, see setting query_profiler_real_time_period_ns and QueryProfilerReal.