runtime: Regression from 5.0: HttpClient does not retry HTTP/1.1 request if TCP connection is closed
There is an inherent race between server closing an “idle” connection and client trying to issue a request to this connection at the same time. Per RFC, we should retry such request (if it is idempotent).
In 6.0, there was a change in retry logic https://github.com/dotnet/runtime/commit/288bfa00fe28b4b25637c362683dd8c6b6573351 which led to requests being no longer retried in case TCP connection was abruptly closed in the middle or right after sending the headers (see HttpConnection.cs)
Because of that, users upgrading from 5.0 to 6.0 would see request failures out of nowhere even with well-behaved servers – on the requests that could be safely retried.
In the repro below, the server would abort the first connection by closing the socket after replying to the first request. HttpClient would successfully execute the second request on a new connection for 5.0, but it will fail with System.IO.IOException: Unable to read data from the transport connection: An existing connection was forcibly closed by the remote host..
on 6.0.
Repro
using System;
using System.Collections.Generic;
using System.Net;
using System.Net.Http;
using System.Net.Sockets;
using System.Text;
using System.Threading;
using System.Threading.Tasks;
namespace TcpRstServerRepro
{
class Program
{
const int serverPort = 5000;
const int responseLength = 10;
public static async Task Main()
{
var httpClient = new HttpClient(new SocketsHttpHandler()
{
MaxConnectionsPerServer = 1
});
using (Socket server = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp))
{
server.Bind(new IPEndPoint(IPAddress.Any, serverPort));
server.Listen();
Task<Socket> handleRequestTask = NewConnectionHandleRequestAsync(server);
HttpResponseMessage response = await httpClient.GetAsync($"http://localhost:{serverPort}/");
Console.WriteLine("CLIENT: Received response:");
Console.WriteLine(response);
Console.WriteLine(await response.Content.ReadAsStringAsync());
response.Dispose();
Socket accept = await handleRequestTask;
Console.WriteLine("CLIENT: Will sleep for 1s");
Thread.Sleep(1_000);
// and once again...
Task<Socket> handleSecondRequestTask = NewConnectionHandleRequestAsync(server);
var secondResponseTask = httpClient.GetAsync($"http://localhost:{serverPort}/");
Console.WriteLine("SERVER: closing connection");
accept.Close(); // closing first connection while HttpClient started the second request...
HttpResponseMessage secondResponse = await secondResponseTask;
Console.WriteLine("CLIENT: Received response:");
Console.WriteLine(secondResponse);
Console.WriteLine(await secondResponse.Content.ReadAsStringAsync());
secondResponse.Dispose();
Socket secondAccept = await handleSecondRequestTask;
Console.WriteLine("SERVER: closing connection");
secondAccept.Close();
}
}
private static async Task ExistingConnectionHandleRequestAsync(Socket accept)
{
//string query = "GET / HTTP/1.1\r\nHost:localhost:5000\r\n\r\n";
var requestBytes = new List<byte>(45);
var byteBuffer = new byte[1];
while (true)
{
int num = await accept.ReceiveAsync(byteBuffer, SocketFlags.None);
if (num == 0)
{
throw new Exception("Unexpected EOS");
}
requestBytes.Add(byteBuffer[0]);
if (requestBytes.Count >= 4 &&
requestBytes[requestBytes.Count - 4] == (byte)'\r' &&
requestBytes[requestBytes.Count - 3] == (byte)'\n' &&
requestBytes[requestBytes.Count - 2] == (byte)'\r' &&
requestBytes[requestBytes.Count - 1] == (byte)'\n')
{
break;
}
}
Console.WriteLine("SERVER: Received query:");
Console.Write(Encoding.ASCII.GetString(requestBytes.ToArray()));
string responseText =
"HTTP/1.1 200 OK\r\n" +
$"Content-Length: {responseLength}\r\n\r\n{new string('a', responseLength)}";
Console.WriteLine("SERVER: Sending response:");
Console.WriteLine(responseText);
await accept.SendAsync(Encoding.ASCII.GetBytes(responseText), SocketFlags.None);
}
private static async Task<Socket> NewConnectionHandleRequestAsync(Socket listener)
{
Socket accept = await listener.AcceptAsync();
accept.LingerState = new LingerOption(true, 0); // this will send TCP RST on socket Close
await ExistingConnectionHandleRequestAsync(accept);
return accept;
}
}
}
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 20 (17 by maintainers)
In 5.0,
_canRetry
starts astrue
and is set tofalse
when we believe the request to be not retriable (when we’ve either started sending the body or received anything from the server). See HttpConnection.cs (release/5.0).In 6.0,
_canRetry
logic is inverted and starts asfalse
and is set totrue
holding the same meaning (when we didn’t send the body the body and didn’t receive anything from the server). See HttpConnection.cs (release/6.0). But because of that, when we get exception with ConnectionReset either while sending the headers or while attempting to read the response,_canRetry
will be left asfalse
and the request will not be retried even when it should.This issue has been mitigated by #72958 for cases where the server advertises its keep-alive timeout via a response header.
I’m not saying we make no change here, just that we don’t want exactly 5.0’s behavior.
The specific non-compliance I’m worried about is hugely dangerous, so it’s not something we can write off as an optimization. However, there are some other changes to investigate that will move the needle in the direction you want.
In particular, we should still retry if sending the headers fails because the connection was reset. At that point we still have not sent any part of the request – we tried to, but it failed.
I believe it might be better to have a separate discussion about that @scalablecory
The initial change https://github.com/dotnet/runtime/commit/288bfa00fe28b4b25637c362683dd8c6b6573351 did not change “the point of no return” (we do not retry if we’ve either started sending the body or received anything from the server) – the regression discussed here was not intentional.
What I intend to do here is to fix the regression for 6.0 which users might badly hit – without changing “the point of no return”.
The further change of retry logic – either not retrying at all if we’ve started sending, or scoping the retries only to specific methods, or something else – needs to have a wider discussion and possibly advertising it as a breaking change, since users that relied on our automatic retries would need to implement the retries themselves.