runtime: "Handle is already used by another Socket." causing 502 errors (NET Core 2.2., NET 5.0, Kestrel, CentOS, Nginx)

We encountered random 502 errors with Net Core 2.2. web application running on CentOS behind Nginx. Thinking the issue is related to version 2.2., we upgraded the application to NET 5.0. The errors still happen random, just in more irregular intervals.

Suggested solutions on the web for 502 errors also suggested, the error could be related to longer loading times of data from database. We optimized all queries in the API using the analysis of execution plans. The errors keep appearing. While examining the Access and Kestrel logs, we found one irregularity. Below are the Kestrel logs of two identical web page requests less than 2 minutes apart. First request returned status 200, second 502. The difference is after the line xxx.DfaMatcher[1005]. The 1st log shows the line Endpoint … is valid for the request path, but the second log is missing this entry.

As the second log shows, there were 14 seconds where Kestrel didn’t process anything (despite the user did reload the web page 4 times). The first entry is Starting HttpMessageHandler cleanup cycle. After that, the 4 web page requests produced 502, which Kestrel shows as: Connection id “xxx” communication error.

We were also surprised to see the API log, where the 4 subsequent requests from the web page (producing 502) were processed by the API with status 200. As a side note: we never encountered any similar issue on the development computer (Win 10 LTSB, Visual Studio 2019 Pro, using IIS) for the past two years.

Access.log: Response 200 213.157.249.198 - - [16/Jun/2021:08:17:18 +0200] “GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1” 200 8025 “https://my-domain.com/Admin/Measurement” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36”

Response 502 213.157.249.198 - - [16/Jun/2021:08:18:55 +0200] “GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1” 502 552 “https://my-domain.com/Admin/Measurement” “Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.106 Safari/537.36”

Kestrel log: Response 200 Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection id “0HM9GAMVBTK15” accepted. Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection id “0HM9GAMVBTK15” started. Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware[3] Jun 16 08:17:18 .NET Core (dotnet)[551]: Connection “0HM9GAMVBTK15” established using the following protocol: Tls12 Jun 16 08:17:18 .NET Core (dotnet)[551]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1] Jun 16 08:17:18 .NET Core (dotnet)[551]: Request starting HTTP/1.1 GET https://my-domain.com/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 - - Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware[4] Jun 16 08:17:18 .NET Core (dotnet)[551]: The request path /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 does not match a supported file type Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001] Jun 16 08:17:18 .NET Core (dotnet)[551]: 1 candidate(s) found for the request path ‘/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62’ Jun 16 08:17:18 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005] Jun 16 08:17:18 .NET Core (dotnet)[551]: Endpoint ‘Page: /Measurement/Detail’ with route pattern ‘Admin/Measurement/Detail/{id:Guid?}/{handler?}’ is valid for the request path ‘/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62’

Response 502

Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39]
Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1R" accepted.
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1]
Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1R" started.
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware[3]
Jun 16 08:18:55 .NET Core (dotnet)[551]: Connection "0HM9GAMVBTK1R" established using the following protocol: Tls12
Jun 16 08:18:55 .NET Core (dotnet)[551]: info: Microsoft.AspNetCore.Hosting.Diagnostics[1]
Jun 16 08:18:55 .NET Core (dotnet)[551]: Request starting HTTP/1.1 GET https://my-domain.com/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 - -
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.StaticFiles.StaticFileMiddleware[4]
Jun 16 08:18:55 .NET Core (dotnet)[551]: The request path /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 does not match a supported file type
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1001]
Jun 16 08:18:55 .NET Core (dotnet)[551]: 1 candidate(s) found for the request path '/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62'
Jun 16 08:18:55 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Routing.Matching.DfaMatcher[1005]

- Missing line: Endpoint ‘xxx’ is valid for the request path

Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[100]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Starting HttpMessageHandler cleanup cycle with 7 items
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.Extensions.Http.DefaultHttpClientFactory[101]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Ending HttpMessageHandler cleanup cycle after 0.0043ms - processed: 0 items - remaining: 7 items
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[14]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" communication error.
Jun 16 08:19:09 .NET Core (dotnet)[551]: System.InvalidOperationException: Handle is already used by another Socket.
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterCore(IntPtr socketHandle, SocketAsyncContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterSocket(IntPtr socketHandle, SocketAsyncContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.Register()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.StartAsyncOperation(SocketAsyncContext context, TOperation operation, Int32 observedSequenceNumber, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.ReceiveAsync(Memory`1 buffer, SocketFlags flags, Int32& bytesReceived, Action`5 callback, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEventArgs.DoOperationReceive(SafeSocketHandle handle, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.Socket.ReceiveAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver.WaitForDataAsync()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessReceives()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive()
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[39]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" accepted.
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[1]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" started.
Jun 16 08:19:09 .NET Core (dotnet)[551]: fail: Microsoft.AspNetCore.Server.Kestrel[0]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Unhandled exception while processing 0HM9GAMVBTK1T.
Jun 16 08:19:09 .NET Core (dotnet)[551]: System.InvalidOperationException: Handle is already used by another Socket.
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterCore(IntPtr socketHandle, SocketAsyncContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEngine.RegisterSocket(IntPtr socketHandle, SocketAsyncContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.Register()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.OperationQueue`1.StartAsyncOperation(SocketAsyncContext context, TOperation operation, Int32 observedSequenceNumber, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncContext.ReceiveAsync(Memory`1 buffer, SocketFlags flags, Int32& bytesReceived, Action`5 callback, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.SocketAsyncEventArgs.DoOperationReceive(SafeSocketHandle handle, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Sockets.Socket.ReceiveAsync(SocketAsyncEventArgs e, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketReceiver.WaitForDataAsync()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.ProcessReceives()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets.Internal.SocketConnection.DoReceive()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.PipeCompletion.ThrowLatchedException()
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.GetReadResult(ReadResult& result)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.ReadAsync(CancellationToken token)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.IO.Pipelines.Pipe.DefaultPipeReader.ReadAsync(CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.DuplexPipeStream.ReadAsyncInternal(Memory`1 destination, CancellationToken cancellationToken)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.<FillHandshakeBufferAsync>g__InternalFillHandshakeBufferAsync|182_0[TIOAdapter](TIOAdapter adap, ValueTask`1 task, Int32 minSize)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.ReceiveBlobAsync[TIOAdapter](TIOAdapter adapter)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at System.Net.Security.SslStream.ForceAuthenticationAsync[TIOAdapter](TIOAdapter adapter, Boolean receiveFirst, Byte[] reAuthenticationData, Boolean isApm)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Https.Internal.HttpsConnectionMiddleware.OnConnectionAsync(ConnectionContext context)
Jun 16 08:19:09 .NET Core (dotnet)[551]: at Microsoft.AspNetCore.Server.Kestrel.Core.Internal.Infrastructure.KestrelConnection`1.ExecuteAsync()
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel[2]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" stopped.
Jun 16 08:19:09 .NET Core (dotnet)[551]: dbug: Microsoft.AspNetCore.Server.Kestrel.Transport.Sockets[7]
Jun 16 08:19:09 .NET Core (dotnet)[551]: Connection id "0HM9GAMVBTK1T" sending FIN because: "The Socket transport's send loop completed gracefully."

Nginx error log: 2021/06/16 08:18:55 [error] 1883#0: *3243 upstream prematurely closed connection while reading response header from upstream, client: 213.157.249.198, server: my-domain.com, request: “GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1”, upstream: “https://[::1]:6000/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62”, host: “my-domain.com”, referrer: “https://my-domain.com/Admin/Measurement/Email/5aa3ce5c-612b-4202-896a-f03d20af7d62

2021/06/16 08:18:55 [error] 1883#0: *3243 SSL_do_handshake() failed (SSL: error:140940F5:SSL routines:ssl3_read_bytes:unexpected record) while SSL handshaking to upstream, client: 213.157.249.198, server: my-domain.com, request: “GET /Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62 HTTP/1.1”, upstream: “https://127.0.0.1:6000/Admin/Measurement/Detail/5aa3ce5c-612b-4202-896a-f03d20af7d62”, host: “my-domain.com”, referrer: “https://my-domain.com/Admin/Measurement/Email/5aa3ce5c-612b-4202-896a-f03d20af7d62

Any ideas where to look for the cause of the issue would be appreciated.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 22 (17 by maintainers)

Most upvoted comments

You removed library AspNetCore.Identity.LiteDB and it didn’t happen since then

Based on what we found in https://github.com/dotnet/runtime/issues/64305, https://github.com/fabian-blum/AspNetCore.Identity.LiteDB/pull/14 will fix the issue in AspNetCore.Identity.LiteDB.

I just want to inform everyone, that we’re extensively testing the application for the past few days on the test server. We found some clues but they are not conclusive enough for now. I’ll get back when we have some more info.

Thanks for responding, everybody.

I’m afraid we still can’t consistently reproduce the issue. We’ve been stressing the test server, mainly using Selenium IDE, but have so far been unable to trigger the error. The errors on the production server are seemingly random.

Since last week, the production server is down to a single error occurrence daily, if at all. The frequency of errors was significantly higher not so long ago.That’s a welcome improvement but we don’t fully understand what caused it nor can we be sure it is permanent.

Besides the usual moving parts such as the user interaction and the OS and the .NET updates, the application code also got a potentially significant change. I’m mainly dealing with infrastructure so I’ll invite the application author to provide further details about that.

I would gladly run custom .NET builds and privately provide any other debugging information. The problem I see right now is that we can’t willingly trigger the error, despite actively trying.

To reason about the root cause, we need to see the call stack and the program’s state around the first handle registration that doesn’t throw.

Ideally, we should be able to get this info by asking users to enabling tracing, but unfortunately we are not logging the handle value anywhere. We need to provide private builds with extended logging, or exchange debug info with customers in other ways. @server-si if you can reproduce this consistently, any chance you are willing to work with us privately?

Thanks @zblesk for update – just FYI, the Twitter thread you linked is @antonfirsov above from our (Networking) team. Also, summarizing your story here: You removed library AspNetCore.Identity.LiteDB and it didn’t happen since then - you describe it in your blog post. Note: It does not mean necessarily that the library is to be blamed, it may have stumbled on a bug in .NET – something we still would need to chase down and root cause.

@server-si eventually I found out it was caused by an external library. These people were investigating something similar and also mentioned this issue because they found the related blog post, linked above. I don’t know why the library was breaking the pipeline, and only on linux; I won’t be investigating it, either. Just stopped using it.

shouldn’t be the read of _asyncEngine volatile here to ensure it’s always happening after the lock is taken?

Entering the lock has read-acquire semantics. No reads from inside the lock can move to be before the lock.