runtime: Troubleshooting HttpRequestException "A connection attempt failed because the connected party did not properly respond..."

Our production code runs asp.net core 3.1 deployed in azure app service (windows) and is getting a TON of these errors:

System.Net.Http.HttpRequestException: A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.

 ---> System.Net.Sockets.SocketException (10060): A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond.
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   --- End of inner exception stack trace ---
   at System.Net.Http.ConnectHelper.ConnectAsync(String host, Int32 port, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.ConnectAsync(HttpRequestMessage request, Boolean allowHttp2, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.CreateHttp11ConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.GetHttpConnectionAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpConnectionPool.SendWithRetryAsync(HttpRequestMessage request, Boolean doRequestAuth, CancellationToken cancellationToken)
   at System.Net.Http.RedirectHandler.SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
   at System.Net.Http.HttpClient.FinishSendAsyncBuffered(Task`1 sendTask, HttpRequestMessage request, CancellationTokenSource cts, Boolean disposeCts)
   at ShopifySharp.ShopifyService.<>c__DisplayClass26_0`1.<<ExecuteRequestAsync>b__0>d.MoveNext()

We’re not sure exactly why and we would like to get to the bottom of it.

Even though the process is hosted in asp.net core, it is mostly used to run background jobs. The background jobs are issuing A LOT of outbound http requests to over 1k different domains (most of which are subdomains of myshopify.com, like <subdomain>.myshopify.com, but also other APIs such as SendGrid and Google Docs APIs). But at any point in time, I estimate it’s dealing only with about 50 domains/hosts max. It seems that the issue gets worse as the number of requests increases (more requests are issued in parallel). CPU and memory looks reasonable, so I’m guessing there is some sort of bottleneck at the network or runtime level. My understanding is that .net core doesn’t enforce any limit (by default) on the number of concurrent requests to a single host (see https://github.com/dotnet/runtime/issues/29038).

Even though the exception message blames the hosts, I think the issue might be due to some sort of limit that we hit in our process or in Windows. Often we see many of these errors thrown in a short succession (over a single second or a few seconds).

Can anyone recommend a way of troubleshooting this issue? I heard of port exhaustion but I’m not sure if it’s the cause, and how to go about troubleshooting.

Here are a few TCP metrics from the azure portal, in case it helps: image

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 50 (20 by maintainers)

Most upvoted comments

@clement911 We are encountering the same issue, did you come up with a fix for your scenario?

The fact that SocketsHttpHandler doesn’t allow connection reuse if the domain changes even though the IP remains the same is a huge problem for APIs that expects a subdomain for each user. I agree that using IP addresses may not be great as that is unlikely to match the certificates.

There is really no good way to know if they are the same and if it is safe to reuse. They may present different certificate and have different crypto parameters. I’m wondering if setting HTTP proxy outside of SNAT pool would help.

It seems like in this case there is disconnect between service architecture and infrastructure (or vice versa)

This is becoming an increasingly frequent issue. Does anyone have any suggestions?