envoy: `ext_proc` HTTP filter does not properly end the gRPC request stream

Title: ext_proc HTTP filter does not properly end the gRPC request stream

Description: We have an ASP.NET Core external processor implementation. While it works as expected, after every request we are observing IOException: “The client reset the request stream”, which is worrisome (needless to say a performance hit). I implemented a test gRPC client in .NET and our service does not experience the issue when this custom client is used, so there is definitely something going on in the Envoy’s client request streaming logic.

I reviewed the code for the ext_proc filter and it seems to be sending the EOS when done, so it is a little bit of a mystery. I’m wondering if it is a race condition where actual stream is being closed before the empty message with EOS flag hits our processor service. Might be similar to this. If that is the case, then I’d assume everything that is relying on Envoy’s gRPC implementation and uses client request streaming would be affected (is there anything else using it though?). 🤔

I tested with Envoy 1.23 (ce49c7f65668a22b80d1e83c35d170741bb8d46a/1.23.0/Clean/RELEASE/BoringSSL) as well as 1.21.4 - both experience the same issue.

I didn’t have much time to get the dev setup running to actually debug through it, just working off of the static code analysis and traces.

Config: Relevant config for the filter is pretty simple:

http_filters:
  - name: envoy.filters.http.ext_proc
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.http.ext_proc.v3.ExternalProcessor
      failure_mode_allow: false
      async_mode: false
      message_timeout:
        seconds: 2
      processing_mode:
        request_header_mode: "SEND"
        response_header_mode: "SKIP"
        request_body_mode: "NONE"
        response_body_mode: "NONE"
        request_trailer_mode: "SKIP"
        response_trailer_mode: "SKIP"
      grpc_service:
        envoy_grpc:
          cluster_name: "external_processor_cluster"
...
clusters:
  - name: external_processor_cluster
    connect_timeout:
      seconds: 60
    type: STATIC
      load_assignment:
        cluster_name: external_processor_cluster
        endpoints:
          - lb_endpoints:
              - endpoint:
                  address:
                    socket_address:
                      address: 127.0.0.1
                      port_value: 5000
    http2_protocol_options:
      max_concurrent_streams: 1024

Logs: Below are the Envoy traces starting from when the onDestroy called for the ext_proc filter. Communication with the external processor service happens on the [C0][S3359808269571713239] and there are two adjacent messages when closing the stream - “proxying 0 bytes” (I assume sending empty message with EOS flag) and “resetting pool stream” (which I’m not sure what it does). Does the reset happen a little bit too soon?

[2022-08-16 09:16:53.731][2462][debug][router] [source/common/router/router.cc:471] [C0][S3359808269571713239] cluster 'external_processor_cluster' match for URL '/envoy.service.ext_proc.v3.ExternalProcessor/Process'
...
[2022-08-16 09:16:53.763][2462][debug][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:114] onDestroy
[2022-08-16 09:16:53.763][2462][debug][ext_proc] [source/extensions/filters/http/ext_proc/ext_proc.cc:103] Calling close on stream
[2022-08-16 09:16:53.763][2462][debug][ext_proc] [source/extensions/filters/http/ext_proc/client_impl.cc:44] Closing gRPC stream
[2022-08-16 09:16:53.763][2462][trace][router] [source/common/router/upstream_request.cc:271] [C0][S3359808269571713239] proxying 0 bytes
[2022-08-16 09:16:53.763][2462][debug][router] [source/common/router/upstream_request.cc:345] [C0][S3359808269571713239] resetting pool request
[2022-08-16 09:16:53.763][2462][debug][client] [source/common/http/codec_client.cc:140] [C3] request reset
[2022-08-16 09:16:53.763][2462][trace][main] [source/common/event/dispatcher_impl.cc:250] item added to deferred deletion list (size=2)
[2022-08-16 09:16:53.763][2462][debug][pool] [source/common/conn_pool/conn_pool_base.cc:214] [C3] destroying stream: 0 remaining
[2022-08-16 09:16:53.763][2462][trace][main] [source/common/event/dispatcher_impl.cc:250] item added to deferred deletion list (size=3)
[2022-08-16 09:16:53.763][2462][trace][main] [source/common/event/dispatcher_impl.cc:250] item added to deferred deletion list (size=4)
[2022-08-16 09:16:53.763][2462][trace][main] [source/common/event/dispatcher_impl.cc:250] item added to deferred deletion list (size=5)
[2022-08-16 09:16:53.763][2462][trace][main] [source/common/event/dispatcher_impl.cc:250] item added to deferred deletion list (size=6)
[2022-08-16 09:16:53.763][2462][trace][misc] [source/common/event/scaled_range_timer_manager_impl.cc:60] enableTimer called on 0x579bff1b7d80 for 3600000ms, min is 3600000ms
[2022-08-16 09:16:53.763][2462][trace][http] [source/common/http/filter_manager.cc:515] [C2][S3826041998523566712] decodeHeaders filter iteration aborted due to local reply: filter=envoy.filters.http.router
[2022-08-16 09:16:53.763][2462][trace][http] [source/common/http/filter_manager.cc:525] [C2][S3826041998523566712] decode headers called: filter=envoy.filters.http.router status=1
[2022-08-16 09:16:53.763][2462][trace][http2] [source/common/http/http2/codec_impl.cc:1086] [C3] dispatched 88 bytes
[2022-08-16 09:16:53.763][2462][trace][http2] [source/common/http/http2/codec_impl.cc:1427] [C3] about to send frame type=3, flags=0
[2022-08-16 09:16:53.763][2462][trace][http2] [source/common/http/http2/codec_impl.cc:1450] [C3] send data: bytes=13
[2022-08-16 09:16:53.763][2462][trace][connection] [source/common/network/connection_impl.cc:478] [C3] writing 13 bytes, end_stream false
[2022-08-16 09:16:53.763][2462][trace][http2] [source/common/http/http2/codec_impl.cc:1329] [C3] sent frame type=3, stream_id=1, length=4
[2022-08-16 09:16:53.763][2462][debug][http2] [source/common/http/http2/codec_impl.cc:1358] [C3] sent reset code=0
[2022-08-16 09:16:53.763][2462][debug][http2] [source/common/http/http2/codec_impl.cc:1475] [C3] stream 1 closed: 0
[2022-08-16 09:16:53.763][2462][trace][main] [source/common/event/dispatcher_impl.cc:250] item added to deferred deletion list (size=7)
[2022-08-16 09:16:53.763][2462][trace][main] [source/common/event/dispatcher_impl.cc:125] clearing deferred deletion list (size=7)
[2022-08-16 09:16:53.763][2462][trace][connection] [source/common/network/connection_impl.cc:563] [C2] socket event: 2
[2022-08-16 09:16:53.763][2462][trace][connection] [source/common/network/connection_impl.cc:674] [C2] write ready
[2022-08-16 09:16:53.763][2462][trace][connection] [source/common/network/raw_buffer_socket.cc:67] [C2] write returns: 121
[2022-08-16 09:16:53.763][2462][trace][connection] [source/common/network/connection_impl.cc:563] [C3] socket event: 2
[2022-08-16 09:16:53.763][2462][trace][connection] [source/common/network/connection_impl.cc:674] [C3] write ready
[2022-08-16 09:16:53.763][2462][trace][connection] [source/common/network/raw_buffer_socket.cc:67] [C3] write returns: 13
[2022-08-16 09:16:53.767][2462][trace][connection] [source/common/network/connection_impl.cc:563] [C2] socket event: 3
[2022-08-16 09:16:53.767][2462][trace][connection] [source/common/network/connection_impl.cc:674] [C2] write ready
[2022-08-16 09:16:53.767][2462][trace][connection] [source/common/network/connection_impl.cc:603] [C2] read ready. dispatch_buffered_data=0
[2022-08-16 09:16:53.767][2462][trace][connection] [source/common/network/raw_buffer_socket.cc:24] [C2] read returns: 0
[2022-08-16 09:16:53.767][2462][debug][connection] [source/common/network/connection_impl.cc:651] [C2] remote close
[2022-08-16 09:16:53.767][2462][debug][connection] [source/common/network/connection_impl.cc:250] [C2] closing socket: 0
[2022-08-16 09:16:53.768][2462][trace][connection] [source/common/network/connection_impl.cc:418] [C2] raising connection event 0
[2022-08-16 09:16:53.768][2462][trace][conn_handler] [source/server/active_stream_listener_base.cc:111] [C2] connection on event 0
[2022-08-16 09:16:53.768][2462][debug][conn_handler] [source/server/active_stream_listener_base.cc:120] [C2] adding to cleanup list
[2022-08-16 09:16:53.768][2462][trace][main] [source/common/event/dispatcher_impl.cc:250] item added to deferred deletion list (size=1)
[2022-08-16 09:16:53.768][2462][trace][main] [source/common/event/dispatcher_impl.cc:250] item added to deferred deletion list (size=2)
[2022-08-16 09:16:53.768][2462][trace][main] [source/common/event/dispatcher_impl.cc:125] clearing deferred deletion list (size=2)

Workaround: Thus far, in order to eliminate managed exception in our code for every request, I decided to simply close the connection from our end - since we only process request headers, once we get that processing request - we stop waiting for any more messages and close the connection on our end. 😝 Kind of fighting fire with fire.

while (!context.CancellationToken.IsCancellationRequested
    && await requestStream.MoveNext(context.CancellationToken))
{
  if (requestStream.Current.RequestCase == ProcessingRequest.RequestOneofCase.RequestHeaders) {
    // Process headers
    break; // Do not expect any more messages
  }
}

This seems to be working fine, but now Envoy complains that connection was terminated. 🤷 I hope it is handled somewhat more efficiently in Envoy than exceptions in .NET, so will probably suffice for the time being.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 19 (11 by maintainers)

Most upvoted comments

Facing the same issue with an implementation that uses vert.x grpc (Java).

Google’s grpc client correctly closes the stream regularly. Envoy’s grpc client cancels the stream on each HTTP request cycle, which is seen as an error on the GRPC server part.