envoy: debug assertion failure: `assert failure: isIdleImpl()`
Title: debug assertion failure: assert failure: isIdleImpl()
Description: In our CI with debug builds, we are regularly seeing this assertion failure.
2022-08-04T22:11:20.554576Z critical envoy assert assert failure: isIdleImpl().
2022-08-04T22:11:20.554714Z critical envoy backtrace Caught Aborted, suspect faulting address 0x53900000018
2022-08-04T22:11:20.554729Z critical envoy backtrace Backtrace (use tools/stack_decode.py to get line numbers):
2022-08-04T22:11:20.554739Z critical envoy backtrace Envoy version: 01baf88a8942c273181c830b4307bfe70bd839c0/1.24.0-dev/Clean/DEBUG/BoringSSL
2022-08-04T22:11:20.646444Z critical envoy backtrace #0: Envoy::SignalAction::sigHandler() [0x55787082aedc]
2022-08-04T22:11:20.646627Z critical envoy backtrace #1: __restore_rt [0x7fe4307ab140]
2022-08-04T22:11:20.729793Z critical envoy backtrace #2: Envoy::Tcp::ConnPoolImpl::deleteIsPending() [0x55786ee5e6e5]
2022-08-04T22:11:20.816228Z critical envoy backtrace #3: Envoy::Event::DispatcherImpl::deferredDelete() [0x557870440645]
2022-08-04T22:11:20.901629Z critical envoy backtrace #4: Envoy::Upstream::ClusterManagerImpl::ThreadLocalClusterManagerImpl::tcpConnPoolIsIdle() [0x55786edd54b1]
2022-08-04T22:11:20.994375Z critical envoy backtrace #5: Envoy::Upstream::ClusterManagerImpl::ThreadLocalClusterManagerImpl::ClusterEntry::tcpConnPoolImpl()::$_24::operator()() [0x55786ee05891]
2022-08-04T22:11:21.111388Z critical envoy backtrace #6: std::__1::__invoke<>() [0x55786ee0582d]
2022-08-04T22:11:21.222222Z critical envoy backtrace #7: std::__1::__invoke_void_return_wrapper<>::__call<>() [0x55786ee057dd]
[2022-08-04T22:11:20.230Z] "- - -" 0 - - - "-" 1419 3114 320 - "-" "-" "-" "-" "10.244.0.26:18443" inbound|18443|| 127.0.0.6:38235 10.244.0.26:18443 10.244.0.33:37728 outbound_.18443_._.headless.echo1-3-39127.svc.cluster.local -
[2022-08-04T22:11:20.230Z] "- - -" 0 - - - "-" 1419 3114 320 - "-" "-" "-" "-" "10.244.0.26:18443" inbound|18443|| 127.0.0.6:46235 10.244.0.26:18443 10.244.0.33:37732 outbound_.18443_._.headless.echo1-3-39127.svc.cluster.local -
[2022-08-04T22:11:20.230Z] "- - -" 0 - - - "-" 1419 3114 322 - "-" "-" "-" "-" "10.244.0.26:18443" inbound|18443|| 127.0.0.6:60737 10.244.0.26:18443 10.244.0.33:37730 outbound_.18443_._.headless.echo1-3-39127.svc.cluster.local -
[2022-08-04T22:11:20.552Z] "- - -" 0 - - - "-" 1419 3114 12 - "-" "-" "-" "-" "10.244.0.26:18443" inbound|18443|| 127.0.0.6:56849 10.244.0.26:18443 10.244.0.33:37774 outbound_.18443_._.headless.echo1-3-39127.svc.cluster.local -
2022-08-04T22:11:21.334186Z critical envoy backtrace #8: std::__1::__function::__alloc_func<>::operator()() [0x55786ee057ad]
2022-08-04T22:11:21.446329Z critical envoy backtrace #9: std::__1::__function::__func<>::operator()() [0x55786ee04969]
2022-08-04T22:11:21.588074Z critical envoy backtrace #10: std::__1::__function::__value_func<>::operator()() [0x557869d78252]
2022-08-04T22:11:21.707729Z critical envoy backtrace #11: std::__1::function<>::operator()() [0x557869d78215]
2022-08-04T22:11:21.837349Z critical envoy backtrace #12: Envoy::ConnectionPool::ConnPoolImplBase::checkForIdleAndNotify() [0x55786f362fcd]
2022-08-04T22:11:21.954837Z critical envoy backtrace #13: Envoy::ConnectionPool::ConnPoolImplBase::checkForIdleAndCloseIdleConnsIfDraining() [0x55786f362c73]
2022-08-04T22:11:22.071136Z critical envoy backtrace #14: Envoy::Tcp::ActiveTcpClient::~ActiveTcpClient() [0x55786f321016]
2022-08-04T22:11:22.211542Z critical envoy backtrace #15: Envoy::Tcp::ActiveTcpClient::~ActiveTcpClient() [0x55786f3210f9]
2022-08-04T22:11:22.365065Z critical envoy backtrace #16: std::__1::default_delete<>::operator()() [0x55786a10e6bc]
2022-08-04T22:11:22.489220Z critical envoy backtrace #17: std::__1::unique_ptr<>::reset() [0x55786a10e63c]
2022-08-04T22:11:22.610566Z critical envoy backtrace #18: Envoy::Event::DispatcherImpl::clearDeferredDeleteList() [0x55787043ca34]
2022-08-04T22:11:22.727033Z critical envoy backtrace #19: Envoy::Event::DispatcherImpl::DispatcherImpl()::$_2::operator()() [0x5578704487e9]
2022-08-04T22:11:22.841456Z critical envoy backtrace #20: std::__1::__invoke<>() [0x5578704487ad]
2022-08-04T22:11:22.949119Z critical envoy backtrace #21: std::__1::__invoke_void_return_wrapper<>::__call<>() [0x55787044875d]
2022-08-04T22:11:23.038542Z critical envoy backtrace #22: std::__1::__function::__alloc_func<>::operator()() [0x55787044872d]
2022-08-04T22:11:23.132226Z critical envoy backtrace #23: std::__1::__function::__func<>::operator()() [0x557870447879]
2022-08-04T22:11:23.132367Z critical envoy backtrace #24: std::__1::__function::__value_func<>::operator()() [0x557869d78252]
2022-08-04T22:11:23.132395Z critical envoy backtrace #25: std::__1::function<>::operator()() [0x557869d78215]
2022-08-04T22:11:23.224456Z critical envoy backtrace #26: Envoy::Event::SchedulableCallbackImpl::SchedulableCallbackImpl()::$_0::operator()() [0x557870798ab2]
2022-08-04T22:11:23.314165Z critical envoy backtrace #27: Envoy::Event::SchedulableCallbackImpl::SchedulableCallbackImpl()::$_0::__invoke() [0x557870798a79]
2022-08-04T22:11:23.407180Z critical envoy backtrace #28: event_process_active_single_queue [0x5578707ce10c]
2022-08-04T22:11:23.499323Z critical envoy backtrace #29: event_process_active [0x5578707c8988]
2022-08-04T22:11:23.607093Z critical envoy backtrace #30: event_base_loop [0x5578707c788c]
2022-08-04T22:11:23.710416Z critical envoy backtrace #31: Envoy::Event::LibeventScheduler::run() [0x55787079696f]
2022-08-04T22:11:23.806877Z critical envoy backtrace #32: Envoy::Event::DispatcherImpl::run() [0x557870441129]
2022-08-04T22:11:23.909969Z critical envoy backtrace #33: Envoy::Server::WorkerImpl::threadRoutine() [0x55786ecdca1e]
2022-08-04T22:11:23.998286Z critical envoy backtrace #34: Envoy::Server::WorkerImpl::start()::$_6::operator()() [0x55786ece6ce0]
2022-08-04T22:11:24.084979Z critical envoy backtrace #35: std::__1::__invoke<>() [0x55786ece6c9d]
2022-08-04T22:11:24.181252Z critical envoy backtrace #36: std::__1::__invoke_void_return_wrapper<>::__call<>() [0x55786ece6c4d]
2022-08-04T22:11:24.280837Z critical envoy backtrace #37: std::__1::__function::__alloc_func<>::operator()() [0x55786ece6c1d]
2022-08-04T22:11:24.370161Z critical envoy backtrace #38: std::__1::__function::__func<>::operator()() [0x55786ece5e19]
2022-08-04T22:11:24.370216Z critical envoy backtrace #39: std::__1::__function::__value_func<>::operator()() [0x557869d78252]
2022-08-04T22:11:24.370226Z critical envoy backtrace #40: std::__1::function<>::operator()() [0x557869d78215]
2022-08-04T22:11:24.458427Z critical envoy backtrace #41: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::operator()() [0x557870c965cd]
2022-08-04T22:11:24.553854Z critical envoy backtrace #42: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x557870c965a5]
2022-08-04T22:11:24.553982Z critical envoy backtrace #43: start_thread [0x7fe43079fea7]
I don’t know when this started as we don’t have regular runs with debug images (yet)
Repro steps: I don’t have a minimal repro but we hit it in our CI easily
Example failiures: https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/_test_context/echo1-2-83331-state2134015854/config-0/tproxy-v1-67ff8b6b87-8lkcl_istio-proxy.previous.log https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/_test_context/echo1-2-83331-state2134015854/config-0/b-v1-b894cccd9-mkjzf_istio-proxy.previous.log https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-distroless_istio/1555312434737057792/artifacts/security-5a6e9dece7ce4accb167b8/TestReachability/global_mtls_strict/_test_context/echo1-3-39127-state3347915103/primary-0/headless-v1-6bccb45b85-mbb9w_istio-proxy.previous.log https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security_istio/1555312437832454144/artifacts/security-74627bac803844689bbd7f/TestPassThroughFilterChain/DISABLE/_test_context/echo1-3-55147-state4248486318/primary-0/b-v1-7ddb945cdb-9m4x5_istio-proxy.previous.log https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security_istio/1555312437832454144/artifacts/security-74627bac803844689bbd7f/TestPassThroughFilterChain/DISABLE/_test_context/echo1-3-55147-state4248486318/primary-0/tproxy-v1-59f9b9d764-rw5wl_istio-proxy.previous.log
Some core dumps: https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security_istio/1555312437832454144/artifacts/security-74627bac803844689bbd7f/TestPassThroughFilterChain/DISABLE/_test_context/echo1-3-55147-state4248486318/primary-0/b-v1-7ddb945cdb-9m4x5_core.proxy.25 https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/headless/to_tproxy/mtls/tcp-wl-only(allow)/_test_context/echo1-2-83331-state2821089782/config-0/tproxy-v1-67ff8b6b87-8lkcl_core.proxy.26 https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/headless/to_tproxy/mtls/tcp-wl-only(allow)/_test_context/echo1-2-83331-state2821089782/config-0/tproxy-v1-67ff8b6b87-8lkcl_core.proxy.28 https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/headless/to_tproxy/mtls/tcp-wl-only(allow)/_test_context/echo1-2-83331-state2821089782/config-0/tproxy-v1-67ff8b6b87-8lkcl_core.proxy.31
Envoy build: istio/proxy 01baf88a8942c273181c830b4307bfe70bd839c0, which is Envoyproxy a95d1b7f751e03c5033b84223c8b70acb786fd2b from 8/3
cc @kyessenov
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 37 (36 by maintainers)
Commits related to this issue
- test: adding original dst tests. (#23645) Part of #22583 Signed-off-by: Alyssa Wilk <alyssar@chromium.org> — committed to envoyproxy/envoy by alyssawilk 2 years ago
- upstream: fix for race condition in tcp conn management (#30807) Risk Level: Low Testing: Manual. Before the fix assertion triggered within 5 minutes. After the fix crash does not happen. Docs Chan... — committed to envoyproxy/envoy by cpakulski 7 months ago
I’m snowed under this week but I’ll try to craft a debug logging PR next week. Thanks for your patience!
Ok no patch yet (sorry, mostly focused on EnvoyCon) but I’ve at least got an integration test of basic original dst cluster additions and removals both for TCP and HTTP neither of which I don’t think we had previously. @kyessenov if you have ideas of how I could better tweak the tests I sent your way to repro the thrashing issues you were seeing lmk, otherwise I’ll try to add some sensible logging based on those tests and ship them over to @belyalov tomorrow