runtime: `HttpClient.SendAsync(...)` may fail to return response that's returned by server prior to completion of sending full request.

Description

HttpClient.SendAsync(...), and by extension all derivative family of calls, incorrectly assume that the server can only issue response upon completion and full consumption of all of request’s content. While in most cases servers do, indeed, tend to issue response only after having fully read in and processed the request, this is not a requirement, and does not hold in all the cases; in some cases they can and do issue responses without having to fully consumed the request. In such latter cases, HttpClient fails with quirky internal exceptions (e.g. socket closed exception with EPIPE), and prevents calling code from examining the received response.

Details

The current case is with uploading a large (>100MB) content to AWS S3 bucket, using pre-signed URL via PUT HTTP Verb.

AWS S3 pre-signed URLs are generated ahead of time, with various specifications of conditions about how it can be used, from permissions, to what headers must be specified when accessing it. If subsequent request to generated URL doesn’t match specifications stated during its creation, AWS S3 will reject the call with proper RESTful response.

AWS S3 in such cases does not read the full request before beginning its validation logic. It doesn’t need to because headers are sufficient to determine if the request is valid. If request is invalid, there is no point to waste time and bandwidth reading the content of PUT operation, as it will be discarded and rejected. And in such cases where request is rejected after examining request headers, it will issue HTTP response stating the error and hard-close the connection.

When server hard-closes the connection while HttpClient is writing request content, it encounters an error on underlying socket, fails to understand the nature of the error, and more importantly, it fails to attempt to read received response that is pending in network buffer to understand that there’s a valid response that it can process. E.g. it does:

System.Net.Http.HttpRequestException: Error while copying content to a stream.
 ---> System.IO.IOException: Unable to read data from the transport connection: Broken pipe.
 ---> System.Net.Sockets.SocketException (32): Broken pipe
   --- End of inner exception stack trace ---
   at System.Net.Security.SslStream.<WriteSingleChunk>g__CompleteAsync|210_1[TWriteAdapter](ValueTask writeTask, Byte[] bufferToReturn)
   at System.Net.Security.SslStream.WriteAsyncChunked[TWriteAdapter](TWriteAdapter writeAdapter, ReadOnlyMemory`1 buffer)
   at System.Net.Security.SslStream.WriteAsyncInternal[TWriteAdapter](TWriteAdapter writeAdapter, ReadOnlyMemory`1 buffer)
   at System.Net.Http.HttpConnection.WriteAsync(ReadOnlyMemory`1 source)
   at System.IO.Stream.CopyToAsyncInternal(Stream destination, Int32 bufferSize, CancellationToken cancellationToken)
   at System.Net.Http.HttpContent.CopyToAsyncCore(ValueTask copyTask)
   --- End of inner exception stack trace ---
   at System.Net.Http.HttpContent.CopyToAsyncCore(ValueTask copyTask)
   at System.Net.Http.HttpConnection.SendRequestContentAsync(HttpRequestMessage request, HttpContentWriteStream stream, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnection.SendAsyncCore(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithNtConnectionAuthAsync(HttpConnection connection, HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at MobileLabs.DeviceConnect.Framework.Http.HttpClient.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)

This is incorrect behavior. When HttpClient experiences request write error, it should assume that all is lost, and instead should still attempt to read and parse receive buffer to see if it contains a full and valid response. If it does, it should return it as if normal operation occurred without hiccups; if it doesn’t, then it should throw exception indicating abnormal operation during request write.

Suggested repro steps.

Since proper demonstration would require web server with specific behavior endpoint, the sample code below is limited to client only. You can use aforementioned AWS S3 for such setup and craft proper pre-signed URL which would fail to accept upload request, or any other endpoint that would read headers but not content, and issue proper and valid response rejecting the request (and close the socket prior to full upload completion).

The following client code was used that resulted in the above exception:

var request = new HttpRequestMessage(HttpMethod.Put, uploadUri)
{
    Content = new StreamContent(largeVideoStreamOfHundredsMegabytes)
};

request.Content.Headers.ContentType = new System.Net.Http.Headers.MediaTypeHeaderValue("video/mp4");

var uploadTimeout = TimeSpan.FromMinutes(Timeout);
using var uploadTimeoutToken = new CancellationTokenSource(uploadTimeout);
using var response = await Framework.Http.HttpClient.Instance.SendAsync(request, uploadTimeoutToken.Token);

Configuration

.NET Core 3.x (I don’t think it really matters) macOS 10.15.6 x86 Probably not specific to any configuration or platform. It’s how HttpClient is coded.

Regression?

Nah. Rather, it’s an API design issue that doesn’t properly foresee such scenario.

Other information

Problem, as hinted above, is API assuming that request will be read fully before issuing response. Server can issue response without fully reading request, or even. issue and stream response as it is reading request (e.g. media conversion type endpoint that converts media as it reads and produces response output, without storing interim - i.e. converts on-the-fly and writes result on-the-fly back to client.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 18 (11 by maintainers)

Most upvoted comments

That’s interesting. This seems to be a platform difference between Linux and Windows. I ran my sample code from above on Linux, and it does indeed receive all sent data before throwing the connection reset exception.

That said, my general point here stands: Once you send a RST you cannot rely on the peer receiving any data you sent before the RST. That’s certainly true on Windows, as shown above, and may be true on other platforms as well. It may even be true on Linux in certain situations.

When you make a low level send() call and this call returns successfully, this does not mean that any data has been sent yet. It only means that the data has been accepted, placed into the send buffer of the socket and the kernel will send out the data on the next occasion possible and as fast as the network allows it. If you make a socket send RST to the other side, then the local socket must drop all data still in the send buffer, as that is mandated by the RFC.

Yes, but that’s another good reason to avoid sending a RST in this case. The sending application does not know whether the data that it has sent (i.e. placed into the send buffer) has actually been sent and acked by the peer or not. So you cannot safely send a RST, because there could always be unacked data in your own send buffer that would be blown away.

In fact your SLEEP_A_BIT code above demonstrates this: it’s only safe to send the RST after a nontrivial delay, because otherwise you might blow away your own send buffer. In your example, the 1 second delay is enough; but in general, over non-loopback, you can’t really know when it’s okay to actually send the RST.

However the RFC nowhere says that data at the receive buffer on the other side has to be dropped. This also would make little sense, as this data has already been confirmed as being correctly received; but being correctly received doesn’t mean the application had any chance to see that data yet, it only means that data was placed in the receive buffer. A sender does not expect that data which has already been acknowledged by the other side as being correctly received to suddenly become “not correctly received” when it issues a TCP RST.

There are good reasons to drop the receive buffer on receiving RST: it allows you to notify the application promptly that the connection has been reset. For example, imagine the server starts sending a large response, and then in the middle of sending the response, a fatal error occurs and the connection is reset. The client is processing data as it comes in, but it may have a large backlog of data in its receive buffer that hasn’t been delivered to the application yet. If it continues to deliver this data to the application even after receiving the RST, then the application needs to process all this data before it will be notified of the RST, which is probably just wasted time and CPU. Delivering the RST immediately avoids this problem.

(And because of that, I’m a bit surprised that Linux behaves the way it does…)

@geoffkizer I think you are missing a very important point here: When you make a low level send() call and this call returns successfully, this does not mean that any data has been sent yet. It only means that the data has been accepted, placed into the send buffer of the socket and the kernel will send out the data on the next occasion possible and as fast as the network allows it. If you make a socket send RST to the other side, then the local socket must drop all data still in the send buffer, as that is mandated by the RFC. However the RFC nowhere says that data at the receive buffer on the other side has to be dropped. This also would make little sense, as this data has already been confirmed as being correctly received; but being correctly received doesn’t mean the application had any chance to see that data yet, it only means that data was placed in the receive buffer. A sender does not expect that data which has already been acknowledged by the other side as being correctly received to suddenly become “not correctly received” when it issues a TCP RST.

Let me demonstrate that with a very simple sample program. Unfortunately I don’t know C# or .NET, so please forgive me for just using plain old C code instead. I will include the full source code below and link to an online resource, where you can directly test that code in your browser.

Let’s just quickly explain what the program does:
The program creates a local TCP server socket, connects a local client socket to it, writes 200 integer numbers from client to server, closes the client socket, and finally tries to read as much data as possible from the server socket, so we can see what happens.

There are three interesting pre-processor defines on top:

HAVE_SIN_LEN - This is just to make the code compatible with (Free-)BSD and macOS/iOS. On those systems the define must be set to 1, whereas on Linux it must be set to 0. On Windows… I have no idea but you probably know; also I’m not sure how much of that code is going to compile on Windows in the first place.

CLOSE_WITH_RST - If set to 1, the client will ensure the socket is closed by a RST. If set to 0, the client will close the socket gracefully instead, so it will be closed by a FIN and then linger. It would only close by RST after the linger time has passed but the program won’t run long enough for that to happen.

SLEEP_A_BIT - If set to 1, the client will wait an entire second before closing the socket after sending the data. If set to 0, it will close the socket as fast as possible. This holds true no matter if the socket is closed by RST or by FIN. This just adds some delay to give the kernel enough time to perform the actual send operations required to transfer all the data in the send buffer and yes, this does make a difference!

Now here’s the output if I disable CLOSE_WITH_RST, no matter if sleeping or not:

Server socket waiting on port 54161.
Client socket connected.
Server accepted client connection.
Wrote 200 integers from 0 to 199 to server.
Client socket closed with TCP RST
Server is reading data now:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 

Server has reached the end of the stream.

As you can see, I get all 200 numbers and the stream closes normally. But what happens if I enable CLOSE_WITH_RST and keep SLEEP_A_BIT disabled? Well, the result will vary.

It can be like this (up to 159):

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, Server: recv failed: Connection reset by peer

Or it can be like this (up to 47):

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, Server: recv failed: Connection reset by peer

But what happens if I enable SLEEP_A_BIT? Well, then the result will always go up to 199 before I run into an error:

0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, Server: recv failed: Connection reset by peer

And that’s because of what I explained at the very top: The RFC demands that after sending a RST, no further data must be sent to the peer, hence all unsent data in the local send buffer will be lost. But no acknowledged data in the receive buffer of the peer will be lost.

Try running that code online: https://onlinegdb.com/8ToVOsWSP It always fails at 16 for me because there wasn’t enough time to transfer more than the first 16 integers before the socket is closed forcefully.

Now let’s enabled sleeping: https://onlinegdb.com/aGMZcpt0A See, it runs through all the way to 199. There’s still an error in the end as receiving a RST is an erroneous situation the other side should be aware of, but already sent data won’t be lost because of a RST.

You may dispute that setting Linger time to zero will force a RST but it’s running the code locally it’s easy to prove that this is the case, just keep this open in a terminal window sudo tcpdump -i lo0 tcp

With CLOSE_WITH_RST enabled:

20:27:08.568103 IP localhost.55963 > localhost.55962: Flags [R.], seq 871, ack 1, win 6379, length 0
20:27:08.568121 IP localhost.55962 > localhost.55963: Flags [.], ack 871, win 6366, options [nop,nop,TS val 2538893231 ecr 444660579], length 0
20:27:08.568131 IP localhost.55963 > localhost.55962: Flags [R], seq 3120040667, win 0, length 0

With CLOSE_WITH_RST disabled:

20:28:08.125581 IP localhost.55989 > localhost.55988: Flags [FP.], seq 235:891, ack 1, win 6379, options [nop,nop,TS val 3863493664 ecr 374418739], length 656
20:28:08.125614 IP localhost.55988 > localhost.55989: Flags [.], ack 892, win 6365, options [nop,nop,TS val 374418740 ecr 3863493663], length 0
20:28:08.125965 IP localhost.55988 > localhost.55989: Flags [F.], seq 1, ack 892, win 6365, options [nop,nop,TS val 374418740 ecr 3863493663], length 0
20:28:08.126032 IP localhost.55989 > localhost.55988: Flags [.], ack 2, win 6379, options [nop,nop,TS val 3863493664 ecr 374418740], length 0

R is the RST flag and F is the FIN flag.

Finally, let me append the entire source code below, just in case, as I don’t know how long the online service referenced above will keep that code available or how long that service is going to exist. So in case it goes down, here’s the code once again:

#include <assert.h>
#include <errno.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include <unistd.h>
#include <arpa/inet.h>
#include <sys/socket.h>

#define HAVE_SIN_LEN   0
#define CLOSE_WITH_RST 0
#define SLEEP_A_BIT    0

// ============================================================================
// SERVER

static short ServerPort = 0;
static int ServerSocket = -1;


static
void initServer ( void )
{
	ServerSocket = socket(PF_INET, SOCK_STREAM, 0);
	if (ServerSocket < 0) {
		fprintf(stderr, "SERVER: socket failed: %s\n", strerror(errno));
		exit(1);
	}

	struct sockaddr_in addr = {
#if HAVE_SIN_LEN
		.sin_len = sizeof(addr),
#endif
		.sin_family = AF_INET,
		.sin_addr = htonl(INADDR_LOOPBACK)
	};
	socklen_t addrLen = sizeof(addr);

	int bindErr = bind(
		ServerSocket, (struct sockaddr *)&addr, addrLen
	);
	if (bindErr) {
		fprintf(stderr, "SERVER: bind failed: %s\n", strerror(errno));
		exit(1);
	}

	int getNameErr = getsockname(
		ServerSocket, (struct sockaddr *)&addr, &addrLen
	);
	if (getNameErr) {
		fprintf(stderr, "SERVER: getsockname failed: %s\n", strerror(errno));
		exit(1);
	}

	int listenErr = listen(ServerSocket, 1);
	if (listenErr) {
		fprintf(stderr, "SERVER: listen failed: %s\n", strerror(errno));
		exit(1);
	}

	ServerPort = ntohs(addr.sin_port);
	printf("Server socket waiting on port %hu.\n", ServerPort);
}



static int ConnectedServerSocket = -1;


static
void acceptClient ( void )
{
	ConnectedServerSocket = accept(ServerSocket, NULL, NULL);
	if (ConnectedServerSocket < 0) {
		fprintf(stderr, "SERVER: accept failed: %s\n", strerror(errno));
		exit(1);
	}
	printf("Server accepted client connection.\n");
}


static
void tryServerRead ( void )
{
	printf("Server is reading data now:\n\n");

	for (;;) {
		char buffer[64];

		ssize_t bytesIn = recv(
			ConnectedServerSocket, &buffer, sizeof(buffer), 0
		);
		if (bytesIn < 0) {
			fprintf(stderr, "Server: recv failed: %s\n", strerror(errno));
			exit(1);
		}
		if (bytesIn == 0) {
			// We reached the end of stream
			printf("\n\n");
			break;
		}
		printf("%.*s", (int)bytesIn, buffer);
		fflush(stdout);
	}

	printf("Server has reached the end of the stream.\n");
}


// ============================================================================
// CLIENT

static int ClientSocket = -1;

static
void initClient ( void )
{
	ClientSocket = socket(PF_INET, SOCK_STREAM, 0);
	if (ClientSocket < 0) {
		fprintf(stderr, "CLIENT: socket failed: %s\n", strerror(errno));
		exit(1);
	}

	struct sockaddr_in addr = {
#if HAVE_SIN_LEN
		.sin_len = sizeof(addr),
#endif
		.sin_family = AF_INET,
		.sin_addr = htonl(INADDR_LOOPBACK),
		.sin_port = htons(ServerPort)
	};
	socklen_t addrLen = sizeof(addr);

	int connectErr = connect(
		ClientSocket, (struct sockaddr *)&addr, addrLen
	);
	if (connectErr) {
		fprintf(stderr, "CLIENT: connect failed: %s\n", strerror(errno));
		exit(1);
	}

	printf("Client socket connected.\n");
}


static
void spamServer ( )
{
	char buffer[6];
	const int max = 199;
	size_t totalCount = 0;
	for (int i = 0; i <= max; i++) {
		int printCount = snprintf(buffer, sizeof(buffer), "%d, ", i);
		assert(printCount < sizeof(buffer));
		int sendErr = send(ClientSocket, buffer, printCount, 0);
		if (sendErr < 0) {
			fprintf(stderr, "CLIENT: send failed: %s\n", strerror(errno));
			exit(1);
		}
		totalCount += printCount;
	}
	printf("Wrote %d integers from 0 to %d to server.\n", max + 1, max);
}


static
void killClient ( void )
{
#if CLOSE_WITH_RST
	// SO_LINGER
	// Disabling linger will only prevent close() from blocking but the system
	// will still linger in the background. Setting linger time to zero however
	// means the linger timeout is hit immediately when close() is called and
	// as a result, the socket is not closed with FIN but with RST.
	struct linger l = {
		.l_onoff = 1,
		.l_linger = 0
	};
	int optErr = setsockopt(
		ClientSocket, SOL_SOCKET, SO_LINGER, &l, sizeof(l)
	);
	if (optErr) {
		fprintf(stderr, "CLIENT: setsockopt failed: %s\n", strerror(errno));
		exit(1);
	}
#endif

#if SLEEP_A_BIT
	sleep(1);
#endif

	int closeErr = close(ClientSocket);
	if (closeErr) {
		fprintf(stderr, "CLIENT: close failed: %s\n", strerror(errno));
		exit(1);
	}

	ClientSocket = -1;
	printf("Client socket closed with TCP RST\n");
}


// ============================================================================
// MAIN

int main ( int argc, const char * const * args )
{
	initServer();
	initClient();
	acceptClient();
	spamServer();
	killClient();
	tryServerRead();
}

@ManickaP, where and how does that property get set, and what source for status code it uses? The doc link is rather skimpy on details, and I’m not clear on usage and what it represents.

Unfortunately, it won’t help you here, this is the place where it gets set up: https://source.dot.net/#System.Net.Http/System/Net/Http/HttpResponseMessage.cs,172. Meaning only if everything proceeded without an issue and you got non-success status code. Since the server is closing TCP connection on you, the status code won’t be set.

I diged a bit into AWS docs (please correct me if I’m looking at the wrong docs), but it seems that they do support 100-continue: https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectPOST.html:

To configure your application to send the Request Headers before sending the request body, use the 100-continue HTTP status code. For POST operations, this helps you avoid sending the message body if the message is rejected based on the headers (for example, authentication failure or redirect). For more information on the 100-continue HTTP status code, go to Section 8.2.3 of http://www.ietf.org/rfc/rfc2616.txt.

It seems to me like it’s exactly made for your case.