envoy: Envoy dns_filter causes OOM

2020-08-05T23:12:06.994966Z     critical        envoy assert    panic: out of memory
2020-08-05T23:12:06.995014Z     critical        envoy backtrace Caught Aborted, suspect faulting address 0x53900000010
2020-08-05T23:12:06.995024Z     critical        envoy backtrace Backtrace (use tools/stack_decode.py to get line numbers):
2020-08-05T23:12:06.995026Z     critical        envoy backtrace Envoy version: 601cb2e4053746c53bad5811bd9b5f380228278a/1.16.0-dev/Clean/RELEASE/BoringSSL
2020-08-05T23:12:07.002342Z     critical        envoy backtrace #0: __restore_rt [0x7f351b8a68a0]
2020-08-05T23:12:07.022274Z     critical        envoy backtrace #1: [0x5575cc212089]
2020-08-05T23:12:07.029666Z     critical        envoy backtrace #2: (anonymous namespace)::handle_oom() [0x5575ce2aaaa5]
2020-08-05T23:12:07.038984Z     critical        envoy backtrace #3: tcmalloc::allocate_full_cpp_throw_oom() [0x5575ce315c90]
2020-08-05T23:12:07.046672Z     critical        envoy backtrace #4: fmt::v6::basic_memory_buffer<>::grow() [0x5575cc19895d]
2020-08-05T23:12:07.053475Z     critical        envoy backtrace #5: fmt::v6::visit_format_arg<>() [0x5575cc189c92]
2020-08-05T23:12:07.060924Z     critical        envoy backtrace #6: fmt::v6::internal::parse_format_string<>() [0x5575cc1892b5]
2020-08-05T23:12:07.067446Z     critical        envoy backtrace #7: fmt::v6::internal::vformat<>() [0x5575cc1a4032]
2020-08-05T23:12:07.073824Z     critical        envoy backtrace #8: Envoy::Server::ListenSocketFactoryImpl::createListenSocketAndApplyOptions() [0x5575cd7b9d6e]
2020-08-05T23:12:07.081319Z     critical        envoy backtrace #9: Envoy::Server::ListenSocketFactoryImpl::getListenSocket() [0x5575cd7ba208]
2020-08-05T23:12:07.088832Z     critical        envoy backtrace #10: Envoy::Server::ActiveRawUdpListener::ActiveRawUdpListener() [0x5575cd821236]
2020-08-05T23:12:07.095647Z     critical        envoy backtrace #11: Envoy::Server::ActiveRawUdpListenerFactory::createActiveUdpListener() [0x5575cd7b3ec8]
2020-08-05T23:12:07.102233Z     critical        envoy backtrace #12: Envoy::Server::ConnectionHandlerImpl::addListener() [0x5575cd81d3c7]
2020-08-05T23:12:07.109722Z     critical        envoy backtrace #13: std::__1::__function::__func<>::operator()() [0x5575cd81c72e]
2020-08-05T23:12:07.117505Z     critical        envoy backtrace #14: Envoy::Event::DispatcherImpl::runPostCallbacks() [0x5575cd8271f6]
2020-08-05T23:12:07.125794Z     critical        envoy backtrace #15: event_process_active_single_queue [0x5575cdc93cf7]
2020-08-05T23:12:07.136046Z     critical        envoy backtrace #16: event_base_loop [0x5575cdc9286e]
2020-08-05T23:12:07.155180Z     critical        envoy backtrace #17: Envoy::Server::WorkerImpl::threadRoutine() [0x5575cd81be74]
2020-08-05T23:12:07.165376Z     critical        envoy backtrace #18: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x5575cdd44463]
2020-08-05T23:12:07.165516Z     critical        envoy backtrace #19: start_thread [0x7f351b89b6db]

We have had this happen repeatedly. Thanks to @howardjohn , there is some correlation…

2020-08-06T00:00:03.197497Z     debug   envoy filter    Loading DNS table from external file: Success
2020-08-06T00:00:03.197608Z     debug   envoy config    new fc_contexts has 0 filter chains, including 0 newly built
2020-08-06T00:00:03.197616Z     debug   envoy config    add warming listener: name=dns, hash=2887861284968934043, address=0.0.0.0:15013
2020-08-06T00:00:03.197622Z     debug   envoy misc      Initialize listener dns local-init-manager.
2020-08-06T00:00:03.197625Z     debug   envoy init      init manager Listener-local-init-manager dns 2887861284968934043 contains no targets
2020-08-06T00:00:03.197628Z     debug   envoy init      init manager Listener-local-init-manager dns 2887861284968934043 initialized, notifying Listener-local-init-watcher dns
2020-08-06T00:00:03.197661Z     debug   envoy config    warm complete. updating active listener: name=dns, hash=2887861284968934043, address=0.0.0.0:15013
2020-08-06T00:00:03.197666Z     debug   envoy config    draining listener: name=dns, hash=16912794089181669957, address=0.0.0.0:15013
2020-08-06T00:00:03.197677Z     info    envoy upstream  lds: add/update listener 'dns'
2020-08-06T00:00:03.197688Z     debug   envoy config    Resuming discovery requests for type.googleapis.com/envoy.config.route.v3.RouteConfiguration
2020-08-06T00:00:03.197692Z     debug   envoy config    Resuming discovery requests for type.googleapis.com/envoy.api.v2.RouteConfiguration
2020-08-06T00:00:03.197697Z     debug   envoy config    gRPC config for type.googleapis.com/envoy.config.listener.v3.Listener accepted with 62 resources with version 2020-08-06T00:00:03Z/1671
2020-08-06T00:00:03.197730Z     critical        envoy assert    panic: out of memory

xds update of the DNS (udp) listener. https://github.com/istio/istio/issues/26171

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 18 (16 by maintainers)

Most upvoted comments

@dio if you check out istio/istio (I ran at bc1ea95017bd7ef99994998de2c2aeeccfda1736), then run:

kind create cluster # Happens on any cluster, kind is just simple
go test -v -p 1 ./tests/integration/pilot/ -run TestLocality -count 100 --istio.test.kube.loadbalancer=false --istio.test.nocleanup

This will deploy a bunch of pods to echo-******** namespace

It may take a few tries, but eventually you should see all the pods crash. kubectl logs c-v1-7bb495cfbf-p22dz -c istio-proxy -p will show the crash.

One interesting thing I found is that I just ran it and it took a few minutes to trigger, but when it did I had 4 (out of 5) pods all hit this at the same time

Thanks for the reproducer @howardjohn, I can reproduce the issue. Seems like the PR https://github.com/envoyproxy/envoy/pull/11914 linked by @rshriram fixes it. I baked that to this image: dio123/envoy:11914-dns-fix and did not see the crash.