runtime: Regression from 5.0: HttpClient does not retry HTTP/1.1 request if TCP connection is closed

There is an inherent race between server closing an “idle” connection and client trying to issue a request to this connection at the same time. Per RFC, we should retry such request (if it is idempotent).

In 6.0, there was a change in retry logic https://github.com/dotnet/runtime/commit/288bfa00fe28b4b25637c362683dd8c6b6573351 which led to requests being no longer retried in case TCP connection was abruptly closed in the middle or right after sending the headers (see HttpConnection.cs)

Because of that, users upgrading from 5.0 to 6.0 would see request failures out of nowhere even with well-behaved servers – on the requests that could be safely retried.

In the repro below, the server would abort the first connection by closing the socket after replying to the first request. HttpClient would successfully execute the second request on a new connection for 5.0, but it will fail with System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host.. on 6.0.

Repro
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Net.Sockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;

namespace TcpRstServerRepro
{
    class Program
    {
        const int serverPort = 5000;
        const int responseLength = 10;

        public static async Task Main()
        {
            var httpClient = new HttpClient(new SocketsHttpHandler() 
            { 
                MaxConnectionsPerServer = 1
            });

            using (Socket server = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp))
            {
                server.Bind(new IPEndPoint(IPAddress.Any, serverPort));

                server.Listen();
                Task<Socket> handleRequestTask = NewConnectionHandleRequestAsync(server);

                HttpResponseMessage response = await httpClient.GetAsync($"http://localhost:{serverPort}/");

                Console.WriteLine("CLIENT: Received response:");
                Console.WriteLine(response);
                Console.WriteLine(await response.Content.ReadAsStringAsync());
                response.Dispose();

                Socket accept = await handleRequestTask;

                Console.WriteLine("CLIENT: Will sleep for 1s");
                Thread.Sleep(1_000);

                // and once again...

                Task<Socket> handleSecondRequestTask = NewConnectionHandleRequestAsync(server);

                var secondResponseTask = httpClient.GetAsync($"http://localhost:{serverPort}/");

                Console.WriteLine("SERVER: closing connection");
                accept.Close(); // closing first connection while HttpClient started the second request...

                HttpResponseMessage secondResponse = await secondResponseTask;

                Console.WriteLine("CLIENT: Received response:");
                Console.WriteLine(secondResponse);
                Console.WriteLine(await secondResponse.Content.ReadAsStringAsync());
                secondResponse.Dispose();

                Socket secondAccept = await handleSecondRequestTask;
                Console.WriteLine("SERVER: closing connection");
                secondAccept.Close();
            }
        }

        private static async Task ExistingConnectionHandleRequestAsync(Socket accept)
        {
            //string query = "GET / HTTP/1.1\r\nHost:localhost:5000\r\n\r\n";
            var requestBytes = new List<byte>(45);
            var byteBuffer = new byte[1];
            while (true)
            {
                int num = await accept.ReceiveAsync(byteBuffer, SocketFlags.None);
                if (num == 0)
                {
                    throw new Exception("Unexpected EOS");
                }

                requestBytes.Add(byteBuffer[0]);
                if (requestBytes.Count >= 4 &&
                    requestBytes[requestBytes.Count - 4] == (byte)'\r' &&
                    requestBytes[requestBytes.Count - 3] == (byte)'\n' &&
                    requestBytes[requestBytes.Count - 2] == (byte)'\r' &&
                    requestBytes[requestBytes.Count - 1] == (byte)'\n')
                {
                    break;
                }
            }

            Console.WriteLine("SERVER: Received query:");
            Console.Write(Encoding.ASCII.GetString(requestBytes.ToArray()));

            string responseText =
                        "HTTP/1.1 200 OK\r\n" +
                        $"Content-Length: {responseLength}\r\n\r\n{new string('a', responseLength)}";

            Console.WriteLine("SERVER: Sending response:");
            Console.WriteLine(responseText);

            await accept.SendAsync(Encoding.ASCII.GetBytes(responseText), SocketFlags.None);
        }

        private static async Task<Socket> NewConnectionHandleRequestAsync(Socket listener)
        {
            Socket accept = await listener.AcceptAsync();
            accept.LingerState = new LingerOption(true, 0); // this will send TCP RST on socket Close

            await ExistingConnectionHandleRequestAsync(accept);

            return accept;
        }
    }
}

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 20 (17 by maintainers)

Most upvoted comments

Is this really the old behavior we had?

In 5.0, _canRetry starts as true and is set to false when we believe the request to be not retriable (when we’ve either started sending the body or received anything from the server). See HttpConnection.cs (release/5.0).

In 6.0, _canRetry logic is inverted and starts as false and is set to true holding the same meaning (when we didn’t send the body the body and didn’t receive anything from the server). See HttpConnection.cs (release/6.0). But because of that, when we get exception with ConnectionReset either while sending the headers or while attempting to read the response, _canRetry will be left as false and the request will not be retried even when it should.

This issue has been mitigated by #72958 for cases where the server advertises its keep-alive timeout via a response header.

Keep-Alive: timeout=xyz

@scalablecory 5.0 behavior is current behavior and multiple real-world services intrinsically depend on this. 6.0 behavior might be spec-compliant, but breaks services. How we do reconcile what to optimize for? spec or real services? By keeping 5.0 behavior, HttpClient is not worse off than it is in today’s 5.0 release. Moving to 6.0 behavior, however, is guaranteed to break services.

I’m not saying we make no change here, just that we don’t want exactly 5.0’s behavior.

The specific non-compliance I’m worried about is hugely dangerous, so it’s not something we can write off as an optimization. However, there are some other changes to investigate that will move the needle in the direction you want.

In particular, we should still retry if sending the headers fails because the connection was reset. At that point we still have not sent any part of the request – we tried to, but it failed.

I believe it might be better to have a separate discussion about that @scalablecory

The initial change https://github.com/dotnet/runtime/commit/288bfa00fe28b4b25637c362683dd8c6b6573351 did not change “the point of no return” (we do not retry if we’ve either started sending the body or received anything from the server) – the regression discussed here was not intentional.

What I intend to do here is to fix the regression for 6.0 which users might badly hit – without changing “the point of no return”.

The further change of retry logic – either not retrying at all if we’ve started sending, or scoping the retries only to specific methods, or something else – needs to have a wider discussion and possibly advertising it as a breaking change, since users that relied on our automatic retries would need to implement the retries themselves.