runtime: Socket.BeginConnect in .NET 6 throws an unobserved SocketException
Description
After updating to .NET 6 we started seeing (separate users and software) unobserved SocketException.
After analyzing the logs, I believe it is being triggered here:
However it is hard to tell because the stacktrace doesn’t really have useful information which could pin point the exact code.
The code that uses AwaitableSocketAsyncEventArgs is in Socket.ConnectAsync
@yaakov-h pointed out this:
ValueTask<TResult>.AsTask is the only code path to there, and I don’t think SK2 knows about ValueTask, so that’s gotta be a runtime bug.
https://source.dot.net/#System.Private.CoreLib/ValueTask.cs,638
Reproduction Steps
Expected behavior
Exception should be catchable, or not happen in this manner?
Actual behavior
Unhandled exception. System.AggregateException: A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. As a result, the unobserved exception was rethrown by the finalizer thread. (Operation canceled) (Operation canceled)
---> System.Net.Sockets.SocketException (125): Operation canceled
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
at System.Threading.Tasks.ValueTask.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)
--- End of inner exception stack trace ---
at System.AggregateException.Handle(Func`2 predicate)
at SteamDatabaseBackend.Bootstrapper.OnUnobservedTaskException(Object sender, UnobservedTaskExceptionEventArgs args) in /home/steamdb/deploy/Bootstrapper.cs:line 149
at System.Threading.Tasks.TaskScheduler.PublishUnobservedTaskException(Object sender, UnobservedTaskExceptionEventArgs ueea)
at System.Threading.Tasks.TaskExceptionHolder.Finalize()
Regression?
I did not get this on .NET 5.
Known Workarounds
No response
Configuration
Debian 11 x64.
Other information
No response
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 5
- Comments: 31 (21 by maintainers)
Commits related to this issue
- Do not throw ObjectDisposeException from Socket.EndXyz methods (#73335) Fix #61411 and harmonize argument exception handling. — committed to dotnet/runtime by antonfirsov 2 years ago
It’s already in daily builds and will be in RC1 of .NET 7. There are no plans to backport it.
It seems like the old EndAccept code had a race that could lead to a leak of a Socket object (which would get finalized eventually, but that’s not ideal). If a socket was successfully accepted and then the listen socket was disposed before EndAccept was called, then we’d leak the accepted socket.
In general, checking if the Socket itself is disposed in the End* method seems questionable to me. If the operation completed successfully, the End* call should succeed regardless of whether the socket has been disposed in the meantime (and thus, not leak the accepted socket etc). If it failed before the socket was disposed, the End call should fail with an exception indicating the failure, not ODE.
The tricky case is when the operation failed because the socket was disposed. Arguably, we should throw ODE in this case. But we explicitly don’t throw ODE for Task and SAEA operations – in fact there’s code deep in the SAEA implementation to convert ODE into OperationAborted in certain circumstances.
So I think we should: (1) Remove the blanket ThrowIfDisposed call from various End methods (2) Wrap a try/catch around the calls to TaskToApm.End that checks specifically for OperationAborted and converts it to ObjectDisposedException
This seems like the best mix of correctness and compatibility here. That said, if we don’t care about maintaining the ODE in this case, we could simply not do (2) and allow the OperationAborted SocketException to propagate.
I think that if we were looking at it now, we would certainly consider rejecting #41585. At the time we thought it was a simple fix, so we did it. In retrospect, I don’t think we fully understood the implications.
+100 from me here on this. We’re going through exactly the same misery trying to track this down here in an app which we’ve just moved from 5-6.
I’m delighted if something new has drawn attention to a lurking latent defect. But highly undelighted that there is no connection, either temporal or “geographical” between the error message and the problem.
Right, that’s a good catch (I didn’t write this code).
So is this not a regression then? My problem with this is how unhelpful the exception is.
Because it’s asynchronous and the error isn’t coming from BeginConnect (BeginConnect returned successfully, synchronously). The error is coming from the asynchronous completion, at which point BeginConnect on the stack is long gone. There are potential things that can be done to improve the exception stack to include the stack of BeginConnect, but they involve doing a stack walk even for the success cases, and thus aren’t pay-for-play.
@xPaw it means that the exceptions we used to throw synchronously on
Begin***methods will be thrown asynchronously, thus only observed onEndConnect, I believe you are facing this change now.