envoy: debug assertion failure: `assert failure: isIdleImpl()`

Title: debug assertion failure: assert failure: isIdleImpl()

Description: In our CI with debug builds, we are regularly seeing this assertion failure.

2022-08-04T22:11:20.554576Z	critical	envoy assert	assert failure: isIdleImpl().
2022-08-04T22:11:20.554714Z	critical	envoy backtrace	Caught Aborted, suspect faulting address 0x53900000018
2022-08-04T22:11:20.554729Z	critical	envoy backtrace	Backtrace (use tools/stack_decode.py to get line numbers):
2022-08-04T22:11:20.554739Z	critical	envoy backtrace	Envoy version: 01baf88a8942c273181c830b4307bfe70bd839c0/1.24.0-dev/Clean/DEBUG/BoringSSL
2022-08-04T22:11:20.646444Z	critical	envoy backtrace	#0: Envoy::SignalAction::sigHandler() [0x55787082aedc]
2022-08-04T22:11:20.646627Z	critical	envoy backtrace	#1: __restore_rt [0x7fe4307ab140]
2022-08-04T22:11:20.729793Z	critical	envoy backtrace	#2: Envoy::Tcp::ConnPoolImpl::deleteIsPending() [0x55786ee5e6e5]
2022-08-04T22:11:20.816228Z	critical	envoy backtrace	#3: Envoy::Event::DispatcherImpl::deferredDelete() [0x557870440645]
2022-08-04T22:11:20.901629Z	critical	envoy backtrace	#4: Envoy::Upstream::ClusterManagerImpl::ThreadLocalClusterManagerImpl::tcpConnPoolIsIdle() [0x55786edd54b1]
2022-08-04T22:11:20.994375Z	critical	envoy backtrace	#5: Envoy::Upstream::ClusterManagerImpl::ThreadLocalClusterManagerImpl::ClusterEntry::tcpConnPoolImpl()::$_24::operator()() [0x55786ee05891]
2022-08-04T22:11:21.111388Z	critical	envoy backtrace	#6: std::__1::__invoke<>() [0x55786ee0582d]
2022-08-04T22:11:21.222222Z	critical	envoy backtrace	#7: std::__1::__invoke_void_return_wrapper<>::__call<>() [0x55786ee057dd]
[2022-08-04T22:11:20.230Z] "- - -" 0 - - - "-" 1419 3114 320 - "-" "-" "-" "-" "10.244.0.26:18443" inbound|18443|| 127.0.0.6:38235 10.244.0.26:18443 10.244.0.33:37728 outbound_.18443_._.headless.echo1-3-39127.svc.cluster.local -
[2022-08-04T22:11:20.230Z] "- - -" 0 - - - "-" 1419 3114 320 - "-" "-" "-" "-" "10.244.0.26:18443" inbound|18443|| 127.0.0.6:46235 10.244.0.26:18443 10.244.0.33:37732 outbound_.18443_._.headless.echo1-3-39127.svc.cluster.local -
[2022-08-04T22:11:20.230Z] "- - -" 0 - - - "-" 1419 3114 322 - "-" "-" "-" "-" "10.244.0.26:18443" inbound|18443|| 127.0.0.6:60737 10.244.0.26:18443 10.244.0.33:37730 outbound_.18443_._.headless.echo1-3-39127.svc.cluster.local -
[2022-08-04T22:11:20.552Z] "- - -" 0 - - - "-" 1419 3114 12 - "-" "-" "-" "-" "10.244.0.26:18443" inbound|18443|| 127.0.0.6:56849 10.244.0.26:18443 10.244.0.33:37774 outbound_.18443_._.headless.echo1-3-39127.svc.cluster.local -
2022-08-04T22:11:21.334186Z	critical	envoy backtrace	#8: std::__1::__function::__alloc_func<>::operator()() [0x55786ee057ad]
2022-08-04T22:11:21.446329Z	critical	envoy backtrace	#9: std::__1::__function::__func<>::operator()() [0x55786ee04969]
2022-08-04T22:11:21.588074Z	critical	envoy backtrace	#10: std::__1::__function::__value_func<>::operator()() [0x557869d78252]
2022-08-04T22:11:21.707729Z	critical	envoy backtrace	#11: std::__1::function<>::operator()() [0x557869d78215]
2022-08-04T22:11:21.837349Z	critical	envoy backtrace	#12: Envoy::ConnectionPool::ConnPoolImplBase::checkForIdleAndNotify() [0x55786f362fcd]
2022-08-04T22:11:21.954837Z	critical	envoy backtrace	#13: Envoy::ConnectionPool::ConnPoolImplBase::checkForIdleAndCloseIdleConnsIfDraining() [0x55786f362c73]
2022-08-04T22:11:22.071136Z	critical	envoy backtrace	#14: Envoy::Tcp::ActiveTcpClient::~ActiveTcpClient() [0x55786f321016]
2022-08-04T22:11:22.211542Z	critical	envoy backtrace	#15: Envoy::Tcp::ActiveTcpClient::~ActiveTcpClient() [0x55786f3210f9]
2022-08-04T22:11:22.365065Z	critical	envoy backtrace	#16: std::__1::default_delete<>::operator()() [0x55786a10e6bc]
2022-08-04T22:11:22.489220Z	critical	envoy backtrace	#17: std::__1::unique_ptr<>::reset() [0x55786a10e63c]
2022-08-04T22:11:22.610566Z	critical	envoy backtrace	#18: Envoy::Event::DispatcherImpl::clearDeferredDeleteList() [0x55787043ca34]
2022-08-04T22:11:22.727033Z	critical	envoy backtrace	#19: Envoy::Event::DispatcherImpl::DispatcherImpl()::$_2::operator()() [0x5578704487e9]
2022-08-04T22:11:22.841456Z	critical	envoy backtrace	#20: std::__1::__invoke<>() [0x5578704487ad]
2022-08-04T22:11:22.949119Z	critical	envoy backtrace	#21: std::__1::__invoke_void_return_wrapper<>::__call<>() [0x55787044875d]
2022-08-04T22:11:23.038542Z	critical	envoy backtrace	#22: std::__1::__function::__alloc_func<>::operator()() [0x55787044872d]
2022-08-04T22:11:23.132226Z	critical	envoy backtrace	#23: std::__1::__function::__func<>::operator()() [0x557870447879]
2022-08-04T22:11:23.132367Z	critical	envoy backtrace	#24: std::__1::__function::__value_func<>::operator()() [0x557869d78252]
2022-08-04T22:11:23.132395Z	critical	envoy backtrace	#25: std::__1::function<>::operator()() [0x557869d78215]
2022-08-04T22:11:23.224456Z	critical	envoy backtrace	#26: Envoy::Event::SchedulableCallbackImpl::SchedulableCallbackImpl()::$_0::operator()() [0x557870798ab2]
2022-08-04T22:11:23.314165Z	critical	envoy backtrace	#27: Envoy::Event::SchedulableCallbackImpl::SchedulableCallbackImpl()::$_0::__invoke() [0x557870798a79]
2022-08-04T22:11:23.407180Z	critical	envoy backtrace	#28: event_process_active_single_queue [0x5578707ce10c]
2022-08-04T22:11:23.499323Z	critical	envoy backtrace	#29: event_process_active [0x5578707c8988]
2022-08-04T22:11:23.607093Z	critical	envoy backtrace	#30: event_base_loop [0x5578707c788c]
2022-08-04T22:11:23.710416Z	critical	envoy backtrace	#31: Envoy::Event::LibeventScheduler::run() [0x55787079696f]
2022-08-04T22:11:23.806877Z	critical	envoy backtrace	#32: Envoy::Event::DispatcherImpl::run() [0x557870441129]
2022-08-04T22:11:23.909969Z	critical	envoy backtrace	#33: Envoy::Server::WorkerImpl::threadRoutine() [0x55786ecdca1e]
2022-08-04T22:11:23.998286Z	critical	envoy backtrace	#34: Envoy::Server::WorkerImpl::start()::$_6::operator()() [0x55786ece6ce0]
2022-08-04T22:11:24.084979Z	critical	envoy backtrace	#35: std::__1::__invoke<>() [0x55786ece6c9d]
2022-08-04T22:11:24.181252Z	critical	envoy backtrace	#36: std::__1::__invoke_void_return_wrapper<>::__call<>() [0x55786ece6c4d]
2022-08-04T22:11:24.280837Z	critical	envoy backtrace	#37: std::__1::__function::__alloc_func<>::operator()() [0x55786ece6c1d]
2022-08-04T22:11:24.370161Z	critical	envoy backtrace	#38: std::__1::__function::__func<>::operator()() [0x55786ece5e19]
2022-08-04T22:11:24.370216Z	critical	envoy backtrace	#39: std::__1::__function::__value_func<>::operator()() [0x557869d78252]
2022-08-04T22:11:24.370226Z	critical	envoy backtrace	#40: std::__1::function<>::operator()() [0x557869d78215]
2022-08-04T22:11:24.458427Z	critical	envoy backtrace	#41: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::operator()() [0x557870c965cd]
2022-08-04T22:11:24.553854Z	critical	envoy backtrace	#42: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x557870c965a5]
2022-08-04T22:11:24.553982Z	critical	envoy backtrace	#43: start_thread [0x7fe43079fea7]

I don’t know when this started as we don’t have regular runs with debug images (yet)

Repro steps: I don’t have a minimal repro but we hit it in our CI easily

Example failiures: https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/_test_context/echo1-2-83331-state2134015854/config-0/tproxy-v1-67ff8b6b87-8lkcl_istio-proxy.previous.log https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/_test_context/echo1-2-83331-state2134015854/config-0/b-v1-b894cccd9-mkjzf_istio-proxy.previous.log https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-distroless_istio/1555312434737057792/artifacts/security-5a6e9dece7ce4accb167b8/TestReachability/global_mtls_strict/_test_context/echo1-3-39127-state3347915103/primary-0/headless-v1-6bccb45b85-mbb9w_istio-proxy.previous.log https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security_istio/1555312437832454144/artifacts/security-74627bac803844689bbd7f/TestPassThroughFilterChain/DISABLE/_test_context/echo1-3-55147-state4248486318/primary-0/b-v1-7ddb945cdb-9m4x5_istio-proxy.previous.log https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security_istio/1555312437832454144/artifacts/security-74627bac803844689bbd7f/TestPassThroughFilterChain/DISABLE/_test_context/echo1-3-55147-state4248486318/primary-0/tproxy-v1-59f9b9d764-rw5wl_istio-proxy.previous.log

Some core dumps: https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security_istio/1555312437832454144/artifacts/security-74627bac803844689bbd7f/TestPassThroughFilterChain/DISABLE/_test_context/echo1-3-55147-state4248486318/primary-0/b-v1-7ddb945cdb-9m4x5_core.proxy.25 https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/headless/to_tproxy/mtls/tcp-wl-only(allow)/_test_context/echo1-2-83331-state2821089782/config-0/tproxy-v1-67ff8b6b87-8lkcl_core.proxy.26 https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/headless/to_tproxy/mtls/tcp-wl-only(allow)/_test_context/echo1-2-83331-state2821089782/config-0/tproxy-v1-67ff8b6b87-8lkcl_core.proxy.28 https://storage.googleapis.com/istio-prow/pr-logs/pull/istio_istio/40290/integ-security-istiodremote_istio/1555312439514370048/artifacts/security-428b85071da243ee82d3fb/TestPassThroughFilterChain/DISABLE_with_PERMISSIVE/headless/to_tproxy/mtls/tcp-wl-only(allow)/_test_context/echo1-2-83331-state2821089782/config-0/tproxy-v1-67ff8b6b87-8lkcl_core.proxy.31

Envoy build: istio/proxy 01baf88a8942c273181c830b4307bfe70bd839c0, which is Envoyproxy a95d1b7f751e03c5033b84223c8b70acb786fd2b from 8/3

cc @kyessenov

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 37 (36 by maintainers)

Commits related to this issue

Most upvoted comments

I’m snowed under this week but I’ll try to craft a debug logging PR next week. Thanks for your patience!

Ok no patch yet (sorry, mostly focused on EnvoyCon) but I’ve at least got an integration test of basic original dst cluster additions and removals both for TCP and HTTP neither of which I don’t think we had previously. @kyessenov if you have ideas of how I could better tweak the tests I sent your way to repro the thrashing issues you were seeing lmk, otherwise I’ll try to add some sensible logging based on those tests and ship them over to @belyalov tomorrow