envoy: Envoy crash while dynamic update of UDP lb_endpoints

Title: Envoy crash while dynamic update of UDP lb_endpoints

After clarify with envoy-security@googlegroups.com (this is config driven issue) I open a issue here in GitHub.

Description:

We found a issue where the envoy crashes, while dynamic updateding the UDP lb_endpoints. If you remove and/or add lb_endpoints to a UDP listener the envoy crashes while the next UDP connection.

Repro steps:

  1. Start envoy with the following config. (There is no need that the lb_endpoints are reachable)

envoy -c envoy.yaml

envoy.yaml

node:
  id: foobar
  cluster: foobar

dynamic_resources:
  cds_config:
    path: /home/debian/cds.yaml
  lds_config:
    path: /home/debian/lds.yaml

lds.yaml

version_info: "1"
resources:  
  - "@type": type.googleapis.com/envoy.config.listener.v3.Listener
    name: listener_0
    address:
      socket_address:
        protocol: UDP
        address: 127.0.0.1
        port_value: 12345
    udp_listener_config:
      downstream_socket_config:
        max_rx_datagram_size: 9000
    listener_filters:
    - name: envoy.filters.udp_listener.udp_proxy
      typed_config:
        '@type': type.googleapis.com/envoy.extensions.filters.udp.udp_proxy.v3.UdpProxyConfig
        stat_prefix: service
        matcher:
            on_no_match:
              action:
                name: route
                typed_config:
                  '@type': type.googleapis.com/envoy.extensions.filters.udp.udp_proxy.v3.Route
                  cluster: service_udp
        upstream_socket_config:
          max_rx_datagram_size: 9000

cds.yaml - with 2 endpoints

version_info: "1"
resources:  
  - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
    name: service_udp
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_udp
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 10.10.10.10
                port_value: 54321
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 10.10.10.20
                port_value: 54321
  1. Connect to the listener port with netcat (no reply if nothing runs on the lb_endpoints)

echo "test" | nc -u 127.0.0.1 12345

  1. Update the config to remove a lb_endpoint

cds-2.yaml - with 1 endpoints

version_info: "2"
resources:  
  - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
    name: service_udp
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_udp
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 10.10.10.10
                port_value: 54321

mv cds-2.yaml cds.yaml

  1. Connect to the listener port with netcat (no reply if nothing runs on the lb_endpoints)

echo "test" | nc -u 127.0.0.1 12345

  1. Update the config to add a lb_endpoint

cds-3.yaml - with 2 endpoints

version_info: "3"
resources:  
  - "@type": type.googleapis.com/envoy.config.cluster.v3.Cluster
    name: service_udp
    type: STATIC
    lb_policy: ROUND_ROBIN
    load_assignment:
      cluster_name: service_udp
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 10.10.10.10
                port_value: 54321
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 10.10.10.20
                port_value: 54321

mv cds-3.yaml cds.yaml

  1. Connect to the listener port with netcat (here envoy crashes)

echo "test" | nc -u 127.0.0.1 12345

Call Stack:

[2023-03-20 16:40:03.955][769][critical][backtrace] [./source/server/backtrace.h:104] Caught Segmentation fault, suspect faulting address 0xd8
[2023-03-20 16:40:03.955][769][critical][backtrace] [./source/server/backtrace.h:91] Backtrace (use tools/stack_decode.py to get line numbers):
[2023-03-20 16:40:03.955][769][critical][backtrace] [./source/server/backtrace.h:92] Envoy version: afa98867c807dee0d833da701ba3ab0c9ace9ada/1.25.2/Clean/RELEASE/BoringSSL
[2023-03-20 16:40:03.956][769][critical][backtrace] [./source/server/backtrace.h:96] #0: __restore_rt [0x7f8595ed3140]
[2023-03-20 16:40:03.988][769][critical][backtrace] [./source/server/backtrace.h:96] #1: Envoy::Extensions::UdpFilters::UdpProxy::UdpProxyFilter::StickySessionClusterInfo::onData() [0x5594210460d0]
[2023-03-20 16:40:04.003][769][critical][backtrace] [./source/server/backtrace.h:96] #2: Envoy::Extensions::UdpFilters::UdpProxy::UdpProxyFilter::onData() [0x559421044b13]
[2023-03-20 16:40:04.017][769][critical][backtrace] [./source/server/backtrace.h:96] #3: Envoy::Server::ActiveRawUdpListener::onDataWorker() [0x559422713a9d]
[2023-03-20 16:40:04.031][769][critical][backtrace] [./source/server/backtrace.h:96] #4: Envoy::Network::UdpListenerImpl::processPacket() [0x559422799900]
[2023-03-20 16:40:04.046][769][critical][backtrace] [./source/server/backtrace.h:96] #5: Envoy::Network::passPayloadToProcessor() [0x55942294ba3a]
[2023-03-20 16:40:04.060][769][critical][backtrace] [./source/server/backtrace.h:96] #6: Envoy::Network::Utility::readFromSocket() [0x55942294d0b4]
[2023-03-20 16:40:04.074][769][critical][backtrace] [./source/server/backtrace.h:96] #7: Envoy::Network::Utility::readPacketsFromSocket() [0x55942294dec3]
[2023-03-20 16:40:04.088][769][critical][backtrace] [./source/server/backtrace.h:96] #8: Envoy::Network::UdpListenerImpl::handleReadCallback() [0x5594227992d6]
[2023-03-20 16:40:04.102][769][critical][backtrace] [./source/server/backtrace.h:96] #9: Envoy::Network::UdpListenerImpl::onSocketEvent() [0x559422799096]
[2023-03-20 16:40:04.117][769][critical][backtrace] [./source/server/backtrace.h:96] #10: std::__1::__function::__func<>::operator()() [0x559422782c31]
[2023-03-20 16:40:04.131][769][critical][backtrace] [./source/server/backtrace.h:96] #11: Envoy::Event::FileEventImpl::assignEvents()::$_1::__invoke() [0x559422783ffd]
[2023-03-20 16:40:04.146][769][critical][backtrace] [./source/server/backtrace.h:96] #12: event_process_active_single_queue [0x559422b35d00]
[2023-03-20 16:40:04.161][769][critical][backtrace] [./source/server/backtrace.h:96] #13: event_base_loop [0x559422b34641]
[2023-03-20 16:40:04.176][769][critical][backtrace] [./source/server/backtrace.h:96] #14: Envoy::Server::WorkerImpl::threadRoutine() [0x5594220971c4]
[2023-03-20 16:40:04.190][769][critical][backtrace] [./source/server/backtrace.h:96] #15: Envoy::Thread::ThreadImplPosix::ThreadImplPosix()::{lambda()#1}::__invoke() [0x559422b3c193]
[2023-03-20 16:40:04.190][769][critical][backtrace] [./source/server/backtrace.h:96] #16: start_thread [0x7f8595ec7ea7]

Testet with envoy version 1.25.2 and v1.24.3 (from the github releases) on debian11.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 7
  • Comments: 23 (2 by maintainers)

Commits related to this issue

Most upvoted comments

bump