runtime: Socket.BeginConnect in .NET 6 throws an unobserved SocketException

Description

After updating to .NET 6 we started seeing (separate users and software) unobserved SocketException.

After analyzing the logs, I believe it is being triggered here:

https://github.com/SteamRE/SteamKit/blob/2.4.0-Alpha.3/SteamKit2/SteamKit2/Networking/Steam3/TcpConnection.cs#L158

However it is hard to tell because the stacktrace doesn’t really have useful information which could pin point the exact code.

The code that uses AwaitableSocketAsyncEventArgs is in Socket.ConnectAsync

@yaakov-h pointed out this:

ValueTask<TResult>.AsTask is the only code path to there, and I don’t think SK2 knows about ValueTask, so that’s gotta be a runtime bug.

https://source.dot.net/#System.Private.CoreLib/ValueTask.cs,638

Reproduction Steps

Expected behavior

Exception should be catchable, or not happen in this manner?

Actual behavior

Unhandled exception. System.AggregateException: A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. As a result, the unobserved exception was rethrown by the finalizer thread. (Operation canceled) (Operation canceled)
 ---> System.Net.Sockets.SocketException (125): Operation canceled
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Threading.Tasks.ValueTask.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)
   --- End of inner exception stack trace ---
   at System.AggregateException.Handle(Func`2 predicate)
   at SteamDatabaseBackend.Bootstrapper.OnUnobservedTaskException(Object sender, UnobservedTaskExceptionEventArgs args) in /home/steamdb/deploy/Bootstrapper.cs:line 149
   at System.Threading.Tasks.TaskScheduler.PublishUnobservedTaskException(Object sender, UnobservedTaskExceptionEventArgs ueea)
   at System.Threading.Tasks.TaskExceptionHolder.Finalize()

Regression?

I did not get this on .NET 5.

Known Workarounds

No response

Configuration

Debian 11 x64.

Other information

No response

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 5
  • Comments: 31 (21 by maintainers)

Commits related to this issue

Most upvoted comments

It’s already in daily builds and will be in RC1 of .NET 7. There are no plans to backport it.

It seems like the old EndAccept code had a race that could lead to a leak of a Socket object (which would get finalized eventually, but that’s not ideal). If a socket was successfully accepted and then the listen socket was disposed before EndAccept was called, then we’d leak the accepted socket.

In general, checking if the Socket itself is disposed in the End* method seems questionable to me. If the operation completed successfully, the End* call should succeed regardless of whether the socket has been disposed in the meantime (and thus, not leak the accepted socket etc). If it failed before the socket was disposed, the End call should fail with an exception indicating the failure, not ODE.

The tricky case is when the operation failed because the socket was disposed. Arguably, we should throw ODE in this case. But we explicitly don’t throw ODE for Task and SAEA operations – in fact there’s code deep in the SAEA implementation to convert ODE into OperationAborted in certain circumstances.

So I think we should: (1) Remove the blanket ThrowIfDisposed call from various End methods (2) Wrap a try/catch around the calls to TaskToApm.End that checks specifically for OperationAborted and converts it to ObjectDisposedException

This seems like the best mix of correctness and compatibility here. That said, if we don’t care about maintaining the ODE in this case, we could simply not do (2) and allow the OperationAborted SocketException to propagate.

I’m fine prioritizing simplicity and correctness over compatibility, but in that case I’d prefer to see us explicitly documenting a breaking change and rejecting asks like #41585.

I think that if we were looking at it now, we would certainly consider rejecting #41585. At the time we thought it was a simple fix, so we did it. In retrospect, I don’t think we fully understood the implications.

My problem with this is how unhelpful the exception is.

+100 from me here on this. We’re going through exactly the same misery trying to track this down here in an app which we’ve just moved from 5-6.

I’m delighted if something new has drawn attention to a lurking latent defect. But highly undelighted that there is no connection, either temporal or “geographical” between the error message and the problem.

Right, that’s a good catch (I didn’t write this code).

So is this not a regression then? My problem with this is how unhelpful the exception is.

shouldn’t it point to the code where BeginConnect was called?

Because it’s asynchronous and the error isn’t coming from BeginConnect (BeginConnect returned successfully, synchronously). The error is coming from the asynchronous completion, at which point BeginConnect on the stack is long gone. There are potential things that can be done to improve the exception stack to include the stack of BeginConnect, but they involve doing a stack walk even for the success cases, and thus aren’t pay-for-play.

Documentation was updated to say that SocketException won’t be thrown on NET 6, how come?

@xPaw it means that the exceptions we used to throw synchronously on Begin*** methods will be thrown asynchronously, thus only observed on EndConnect, I believe you are facing this change now.