envoy: http: not proxying 413 correctly

Description: In the case envoy is proxying large HTTP request, even upstream returns 413, the client of proxy is getting 503.

Repro steps: A sample envoy config is here.

$ curl -v http://127.0.0.1:8080 --data @envoy.log
> POST / HTTP/1.1
> Host: 127.0.0.1:8080
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 28576635
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
< HTTP/1.1 413 Payload Too Large
< content-length: 17
< content-type: text/plain
< date: Thu, 29 Mar 2018 02:10:27 GMT
< server: envoy
< connection: close
<
* we are done reading and this is set to close, stop send
* Closing connection 0

is hitting directly to listener with buffer filter, results 413.

However,

curl -v http://127.0.0.1:8000 --data @envoy.log
> POST / HTTP/1.1
> Host: 127.0.0.1:8000
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 28576635
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
< HTTP/1.1 503 Service Unavailable
< content-length: 57
< content-type: text/plain
< date: Thu, 29 Mar 2018 02:10:21 GMT
< server: envoy
< connection: close
<

will results 503, the port 8000 is proxying to 8080, which should return 413 too.

envoy.log above could be substitute with any large file (mine is 28MB), if you use smaller file (around 10K) which could fit in the TCP buffer, the port 8000 will return 413 correctly:

curl -v http://127.0.0.1:8000 --data @STYLE.md
> POST / HTTP/1.1
> Host: 127.0.0.1:8000
> User-Agent: curl/7.47.0
> Accept: */*
> Content-Length: 11450
> Content-Type: application/x-www-form-urlencoded
> Expect: 100-continue
>
< HTTP/1.1 100 Continue
* We are completely uploaded and fine
< HTTP/1.1 413 Payload Too Large
< content-length: 17
< content-type: text/plain
< date: Thu, 29 Mar 2018 02:15:29 GMT
< server: envoy
< x-envoy-upstream-service-time: 0
<

From the trace log below, it looks like when envoy write to upstream and it fails, it immediately result 503 without trying to read if the server already sent back the response.

[2018-03-28 19:02:10.131][23666][trace][http] source/common/http/http1/codec_impl.cc:322] [C1] parsed 1048576 bytes
[2018-03-28 19:02:10.131][23666][trace][connection] source/common/network/connection_impl.cc:386] [C2] socket event: 2
[2018-03-28 19:02:10.131][23666][trace][connection] source/common/network/connection_impl.cc:454] [C2] write ready
[2018-03-28 19:02:10.131][23666][trace][connection] source/common/network/raw_buffer_socket.cc:67] [C2] write returns: 524288
[2018-03-28 19:02:10.131][23666][trace][connection] source/common/network/raw_buffer_socket.cc:67] [C2] write returns: -1
[2018-03-28 19:02:10.131][23666][trace][connection] source/common/network/raw_buffer_socket.cc:70] [C2] write error: 32 (Broken pipe)
[2018-03-28 19:02:10.131][23666][debug][connection] source/common/network/connection_impl.cc:134] [C2] closing socket: 0
[2018-03-28 19:02:10.131][23666][trace][http] source/common/http/http1/codec_impl.cc:305] [C2] parsing 0 bytes
[2018-03-28 19:02:10.131][23666][trace][http] source/common/http/http1/codec_impl.cc:322] [C2] parsed 0 bytes
[2018-03-28 19:02:10.131][23666][debug][client] source/common/http/codec_client.cc:81] [C2] disconnect. resetting 1 pending requests
[2018-03-28 19:02:10.131][23666][debug][client] source/common/http/codec_client.cc:104] [C2] request reset
[2018-03-28 19:02:10.131][23666][trace][main] source/common/event/dispatcher_impl.cc:126] item added to deferred deletion list (size=1)
[2018-03-28 19:02:10.131][23666][debug][router] source/common/router/router.cc:464] [C1][S12280405291311848063] upstream reset
[2018-03-28 19:02:10.132][23666][debug][http] source/common/http/conn_manager_impl.cc:968] [C1][S12280405291311848063] encoding headers via codec (end_stream=false):
[2018-03-28 19:02:10.132][23666][debug][http] source/common/http/conn_manager_impl.cc:972] [C1][S12280405291311848063]   ':status':'503'
[2018-03-28 19:02:10.132][23666][debug][http] source/common/http/conn_manager_impl.cc:972] [C1][S12280405291311848063]   'content-length':'57'
[2018-03-28 19:02:10.132][23666][debug][http] source/common/http/conn_manager_impl.cc:972] [C1][S12280405291311848063]   'content-type':'text/plain'
[2018-03-28 19:02:10.132][23666][debug][http] source/common/http/conn_manager_impl.cc:972] [C1][S12280405291311848063]   'date':'Thu, 29 Mar 2018 02:02:10 GMT'
[2018-03-28 19:02:10.132][23666][debug][http] source/common/http/conn_manager_impl.cc:972] [C1][S12280405291311848063]   'server':'envoy'
[2018-03-28 19:02:10.132][23666][debug][http] source/common/http/conn_manager_impl.cc:972] [C1][S12280405291311848063]   'connection':'close'
[2018-03-28 19:02:10.132][23666][trace][connection] source/common/network/connection_impl.cc:323] [C1] writing 153 bytes, end_stream false
[2018-03-28 19:02:10.132][23666][trace][http] source/common/http/conn_manager_impl.cc:1031] [C1][S12280405291311848063] encoding data via codec (size=57 end_stream=true)
[2018-03-28 19:02:10.132][23666][trace][connection] source/common/network/connection_impl.cc:323] [C1] writing 57 bytes, end_stream false

Thanks @JimmyCYJ who originally find this with adding test in istio for #2919.

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 20 (20 by maintainers)

Commits related to this issue

Most upvoted comments

@mattklein123 we changed it because of the combined read|write events raised by the kernel (https://github.com/envoyproxy/envoy/pull/1896), but that’s unrelated and changing order wouldn’t fix this issue.

This issue is happening, because Envoy acting as a server (i.e. listener B in @lizan’s example) closes downstream connection with pending (unread) data, which results in TCP RST packet being sent downstream.

Depending on the timing, downstream (i.e. listener A in @lizan’s example) might be able to receive and proxy complete HTTP response before receiving TCP RST packet (which erases low-level TCP buffers), in which case client will receive response sent by upstream (413 Request Body Too Large in this case, but this issue is not limited to that response code), otherwise client will receive 503 Service Unavailable response generated by listener A (which actually isn’t the most appropriate response code in this case, but that’s a separate issue).

The common solution for this problem is to half-close downstream connection using ::shutdown(fd_, SHUT_WR) and then read downstream until EOF (to confirm that the other side received complete HTTP response and closed connection) or short timeout.

cc @alyssawilk @ggreenway