runtime: SafeSocketHandle.CloseAsIs hanging in finalizer thread

This issue comes from investigating test failures on this Roslyn PR https://github.com/dotnet/roslyn/pull/46510

At the conclusion of running the unit tests for VBCSCompiler server the xUnit process will refuse to exit. The xUnit output will indicate that the tests have completed running but the process itself will not exit. Attaching the debugger to the xUnit process and there are two threads of note that are still running:

GC Finalizer

System.Private.CoreLib.dll!System.Threading.SpinWait.SpinOnceCore(int sleep1Threshold) (Unknown Source:0)
System.Net.Sockets.dll!System.Net.Sockets.SafeSocketHandle.CloseAsIs(bool abortive) (Unknown Source:0)
System.Net.Sockets.dll!System.Net.Sockets.Socket.Dispose(bool disposing) (Unknown Source:0)
System.Net.Sockets.dll!System.Net.Sockets.Socket.~Socket() (Unknown Source:0)
[Native to Managed Transition] (Unknown Source:0)

.NET Sockets

System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncEngine.EventLoop() (Unknown Source:0)
System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncEngine..ctor.AnonymousMethod__14_0(object s) (Unknown Source:0)
System.Private.CoreLib.dll!System.Threading.ThreadHelper.ThreadStart(object obj) (Unknown Source:0)
[Native to Managed Transition] (Unknown Source:0)

The VBCSCompiler server makes heavy use of named pipes. Looking through the Socket on the finalizer thread I can confirm it’s a Unix Domain socket related to the named pipes the compiler is creating (the path in the end point matches the paths we create in the tests).

Unfortunately after a day of debugging I have not been able to narrow this problem down any further:

  1. The problem only repros when running the entire test assembly. I ran each test class in the assembly and none of them individually reproduce the issue. Have to run them as a group.
  2. Went through a cycle of causing the hang, identifying the test which created the socket that was hung in the finalizer, disable that test, re-run the assembly. Did this for about five different tests and it had no impact on the hang.
  3. Thought this may be related to #40289 so I tried aggressively calling NamedPipeServerStream.Dispose on any instance that was hung in a WaitForConnectionAsync call.

None of these has had any impact though. I’ve also been unsuccessful in constructing a more concise repro. 😦

More than happy to provide any info to make tracking this down easier.

Repro Information:

  • Runtime: .NET 5 Preview 7
  • OS: Ubuntu 18.04, Ubuntu 18.04 via WSL2, OSX

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 22 (22 by maintainers)

Commits related to this issue

Most upvoted comments

It would be interesting to identify the operation that is on-going on some other thread.

I’m currently trying to figure that out, but my gut feel is that spinning in the finalizer thread is a bad idea in any case, and we should probably rethink the way CloseAsIs works, regardless of the outcome of the investigation.

Looks like the issue is not caused by outstanding blocking calls. I have a smaller repro now: https://gist.github.com/antonfirsov/ce0cb4992e115bb4d6e8dd6862fd6780

Here is what is happening in my understanding:

  1. SafePipeHandle.Unix increases the reference counter on SocketSafeHandle: https://github.com/dotnet/runtime/blob/c6da68cfd7ded9333291fe37881d7d266cfa6acb/src/libraries/System.IO.Pipes/src/Microsoft/Win32/SafeHandles/SafePipeHandle.Unix.cs#L36
  2. NamedPipeClientConnectionHost seems to leak (= not dispose) some NamedPipeServerStream instances. (@jaredpar I’m wondering if this only happens in the repro branch or is this a bug in the compiler server?)
  3. A Socket finalizer is called before the finalizers of the “owner” SafePipeHandle and NamedPipeServerStream.
  4. Since the reference count is not 0, SocketSafeHandle is not released. CloseAsIs will keep spinning, blocking the finalizer thread, preventing the finalization of SafePipeHandle (therefore the release of SocketSafeHandle): https://github.com/dotnet/runtime/blob/c6da68cfd7ded9333291fe37881d7d266cfa6acb/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SafeSocketHandle.cs#L106-L114

I have an idea for a PR that would remove the spinning.

For some reason the issue does not happen with 3.1.

Did my best to narrow this down to a smaller problem. Created a branch where you can see the delta between our unit tests passing and hanging. This commit shows that it’s simply enabling one new test to run on Linux that causes this hang.

To repro this do the following:

> git clone https://github.com/jaredpar/roslyn -o jaredpar
> git checkout -B repro jaredpar/repro/pipe-hang
> cd src/Compilers/Server/VBCSCompilerServerTests
> dotnet build
> dotnet msbuild /t:Test

Couple notes:

  1. About 1/2 of the time this will cause the tests to fail, the other half it will hang in the finalizer
  2. Running the new test alone is not sufficient to demonstrate the hang, have to run the entire suite (ensure why).

A brief description of what this particular test is doing:

  • Opening four NamedPipeServerStream instances on the same pipe name
  • Creates one NamedPipeClientStream instance that fully connects
  • Dispose three of the NamedPipeServerStream instances, does not dispose the connected one
  • Disposes the NamedPipeClientStream
  • That means after the test completes one of the NamedPipeServerStream instances remains undisposed and it appears this is the one that ends up stuck in the finalizer thread.