runtime: SafeSocketHandle.CloseAsIs hanging in finalizer thread
This issue comes from investigating test failures on this Roslyn PR https://github.com/dotnet/roslyn/pull/46510
At the conclusion of running the unit tests for VBCSCompiler server the xUnit process will refuse to exit. The xUnit output will indicate that the tests have completed running but the process itself will not exit. Attaching the debugger to the xUnit process and there are two threads of note that are still running:
GC Finalizer
System.Private.CoreLib.dll!System.Threading.SpinWait.SpinOnceCore(int sleep1Threshold) (Unknown Source:0)
System.Net.Sockets.dll!System.Net.Sockets.SafeSocketHandle.CloseAsIs(bool abortive) (Unknown Source:0)
System.Net.Sockets.dll!System.Net.Sockets.Socket.Dispose(bool disposing) (Unknown Source:0)
System.Net.Sockets.dll!System.Net.Sockets.Socket.~Socket() (Unknown Source:0)
[Native to Managed Transition] (Unknown Source:0)
.NET Sockets
System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncEngine.EventLoop() (Unknown Source:0)
System.Net.Sockets.dll!System.Net.Sockets.SocketAsyncEngine..ctor.AnonymousMethod__14_0(object s) (Unknown Source:0)
System.Private.CoreLib.dll!System.Threading.ThreadHelper.ThreadStart(object obj) (Unknown Source:0)
[Native to Managed Transition] (Unknown Source:0)
The VBCSCompiler server makes heavy use of named pipes. Looking through the Socket on the finalizer thread I can confirm it’s a Unix Domain socket related to the named pipes the compiler is creating (the path in the end point matches the paths we create in the tests).
Unfortunately after a day of debugging I have not been able to narrow this problem down any further:
- The problem only repros when running the entire test assembly. I ran each test class in the assembly and none of them individually reproduce the issue. Have to run them as a group.
- Went through a cycle of causing the hang, identifying the test which created the socket that was hung in the finalizer, disable that test, re-run the assembly. Did this for about five different tests and it had no impact on the hang.
- Thought this may be related to #40289 so I tried aggressively calling
NamedPipeServerStream.Disposeon any instance that was hung in aWaitForConnectionAsynccall.
None of these has had any impact though. I’ve also been unsuccessful in constructing a more concise repro. 😦
More than happy to provide any info to make tracking this down easier.
Repro Information:
- Runtime: .NET 5 Preview 7
- OS: Ubuntu 18.04, Ubuntu 18.04 via WSL2, OSX
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 22 (22 by maintainers)
Commits related to this issue
- Smaller repro Smaller repro for https://github.com/dotnet/runtime/issues/40301 — committed to jaredpar/roslyn by jaredpar 4 years ago
I’m currently trying to figure that out, but my gut feel is that spinning in the finalizer thread is a bad idea in any case, and we should probably rethink the way
CloseAsIsworks, regardless of the outcome of the investigation.Looks like the issue is not caused by outstanding blocking calls. I have a smaller repro now: https://gist.github.com/antonfirsov/ce0cb4992e115bb4d6e8dd6862fd6780
Here is what is happening in my understanding:
SafePipeHandle.Unixincreases the reference counter onSocketSafeHandle: https://github.com/dotnet/runtime/blob/c6da68cfd7ded9333291fe37881d7d266cfa6acb/src/libraries/System.IO.Pipes/src/Microsoft/Win32/SafeHandles/SafePipeHandle.Unix.cs#L36NamedPipeClientConnectionHostseems to leak (= not dispose) someNamedPipeServerStreaminstances. (@jaredpar I’m wondering if this only happens in the repro branch or is this a bug in the compiler server?)Socketfinalizer is called before the finalizers of the “owner”SafePipeHandleandNamedPipeServerStream.SocketSafeHandleis not released.CloseAsIswill keep spinning, blocking the finalizer thread, preventing the finalization ofSafePipeHandle(therefore the release ofSocketSafeHandle): https://github.com/dotnet/runtime/blob/c6da68cfd7ded9333291fe37881d7d266cfa6acb/src/libraries/System.Net.Sockets/src/System/Net/Sockets/SafeSocketHandle.cs#L106-L114I have an idea for a PR that would remove the spinning.
For some reason the issue does not happen with 3.1.
Did my best to narrow this down to a smaller problem. Created a branch where you can see the delta between our unit tests passing and hanging. This commit shows that it’s simply enabling one new test to run on Linux that causes this hang.
To repro this do the following:
Couple notes:
A brief description of what this particular test is doing:
NamedPipeServerStreaminstances on the same pipe nameNamedPipeClientStreaminstance that fully connectsNamedPipeServerStreaminstances, does not dispose the connected oneNamedPipeClientStreamNamedPipeServerStreaminstances remains undisposed and it appears this is the one that ends up stuck in the finalizer thread.