SSH.NET: Sshclient deadlock/freeze on disconnect

Hi,

My code is locking when attempting to call the Disconnect function.

This snippet is an example of what is locking:

SshClient client = new SshClient("server", "user", "pass");
client.Connect();
client.Disconnect(); # freezes here

I’m connecting to a ssh connection running on a QNAP NAS server, with the server version listed as “SSH-2.0-OpenSSH_7.6”.

This is on OSX (High Sierra). I just tried in Windows, and the issue doesn’t exist there.

Any help would be much appreciated.

Thanks.

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 122 (25 by maintainers)

Commits related to this issue

Most upvoted comments

2020.0.0-beta1 is now available, and includes this fix.

Update: For now I’m working on #516 as I need this fixed in order to run our integration tests against recent versions of OpenSSH. Next up on my list is this issue.

Thanks @drieseng !

Downgrading to version 2016.0.0 also fixed the problem.

I’ll be fixing this issue in the next week or two.

@stephentoub @shanselman I finally had time to look into this again. On Linux, the blocking Socket.Select(...) call is not always getting interrupted when we (shutdown and) close the socket from another thread. It works fine serially, but not when we have multiple threads operating in parallel.

I was able to create a small repro that shows the problem consistently on .NET Core 2.1.15 and 3.1.1. I couldn’t get 5.0 Preview 2 to install side-by-side on Ubuntu 19.10 to check if that made a difference, but users expect a solution for 2.x and 3.x anyway.

I would be grateful if you could spend a minutes to look into this.

Update: On .NET 5.0, the behavior on Linux now closely matches that on Windows. I still noticed a few minor issues which I will submit.

We run into the same issue, but on Linux. Can confirm that it’s waiting on _messageListenerCompleted.WaitOne() inside Disconnect.

We have a wrapper class around the SshClient and we worked around the issue by disconnecting the socket through reflection if necessary after a 2 second delay and setting the waithandle. The dispose method of our wrapper class:

public void Dispose()
{
  if (_client == null) return;

  Task.Run(() =>
  {
	_log.Debug("Disposing _client");

	var timer = new System.Timers.Timer();

	timer.Interval = 2000;
	timer.AutoReset = false;

	timer.Elapsed += (s, e) =>
	{
	  try
	  {
		var sessionField = _client.GetType().GetProperty("Session", BindingFlags.NonPublic | BindingFlags.Instance);

		if (sessionField != null)
		{
		  var session = sessionField.GetValue(_client);

		  if (session != null)
		  {
			var socketField = session.GetType().GetField("_socket", BindingFlags.NonPublic | BindingFlags.Instance);

			if (socketField != null)
			{
			  var socket = (Socket)socketField.GetValue(session);

			  if (socket != null)
			  {
				_log.Debug($"Socket state: Connected = {socket.Connected}, Blocking = {socket.Blocking}, Available = {socket.Available}, LocalEndPoint = {socket.LocalEndPoint}, RemoteEndPoint = {socket.RemoteEndPoint}");

				_log.Debug("Set _socket to null");

				try
				{
				  socket.Dispose();
				}
				catch (Exception ex)
				{
				  _log.Debug("Exception disposing _socket", ex);
				}

				socketField.SetValue(session, null);
			  }
			  else
			  {
				_log.Debug("_socket was null");
			  }
			}

			var messageListenerCompletedField = session.GetType().GetField("_messageListenerCompleted", BindingFlags.NonPublic | BindingFlags.Instance);

			var messageListenerCompleted = (EventWaitHandle)messageListenerCompletedField.GetValue(session);

			if (messageListenerCompleted != null)
			{
			  var waitHandleSet = messageListenerCompleted.WaitOne(0);

			  _log.Debug($"_messageListenerCompleted was set = {waitHandleSet}");

			  if (!waitHandleSet)
			  {
				_log.Debug($"Calling Set()");
				messageListenerCompleted.Set();
			  }
			}
			else
			{
			  _log.Debug("_messageListenerCompleted was null");
			}
		  }
		  else
		  {
			_log.Debug("Session was null");
		  }
		}
	  }
	  catch (Exception ex)
	  {
		_log.Debug($"Exception in Timer event handler", ex);
	  }
	};

	timer.Start();

	_client.Dispose();

	_log.Info("Disposed _client");
  });
}

A typical log when it fails to disconnect:

image

But it usually disconnects just fine, not disconnecting is the exception. The ratios are 20 failures for 700 successful disconnects in about 12 hours.

Just encountered the same problem. Fix in 2020.0.0-beta1 works. Any idea when we can expect a stable 2020 release?

I’m just about to complete a refactoring of the integration tests. When this is done, fixing this and a few other issues should come in fast.

I’ve been able to reproduce this issue on Linux. It appears to be a bug (regression?) in .NET Core 2.1. Filed as https://github.com/dotnet/corefx/issues/31368

@darkoperator .NET 5.0 should fix the root cause. I haven’t yet added a workaround to SSH.NET. Finally finished getting the integration tests (for now still private) running again recent versions of OpenSSH, and now working on getting #629 done and having stable tests on AppVeyor. Once that is done getting the workaround is place is a matter of minutes.

I decided to use Socket.Poll(...) where available. This resolves the issue on .NET Core 1.x to 3.x. This fix will be in the next beta release.

Downgrading to version 2016.0.0 also fixed the problem.

Confirmed! Now if only some brave soul would figure out why 2016.0.0 works and port the fix/workaround from that to the latest version (if possible).

Someone on my team just ran into this issue as well and using 2016.0.0 caused the problem to go away. Is there any idea on when this would be fixed? Thanks. This issue thread was very enlightening.

For me this issue is resolved by replacing

_socket.Shutdown(SocketShutdown.Send); with _socket.Shutdown(SocketShutdown.Both);

in the SocketDisconnectAndDispose in Session.cs

EDIT: I really need SFTP for .NET CORE today, so I forked and created a temporary package on nuget. Will switch back to the real package when this issue has been solved. https://www.nuget.org/packages/SSH.NET.Fork/2018.8.25.2

Yeah, that one line code fixes that hang problem. My app makes thousands of sftp connections everyday without any problem.

Also having this issue on MacOS Catalina (10.15.1), .NET Core 2.2, 2016.1.0.

Looking forward to the fix! 😃

I ran into this issue too.

My environment is WSL Ubuntu 18.04 PowerShell 7.0.0-preview.3 Posh-SSH 3.0 branch (which actually uses https://github.com/asmodat/Asmodat-Standard-SSH.NET, but that is a .NET core nuget of the latest version of this code base)

For me this issue is resolved by replacing

_socket.Shutdown(SocketShutdown.Send); with _socket.Shutdown(SocketShutdown.Both);

in the SocketDisconnectAndDispose in Session.cs

EDIT: I really need SFTP for .NET CORE today, so I forked and created a temporary package on nuget. Will switch back to the real package when this issue has been solved. https://www.nuget.org/packages/SSH.NET.Fork/2018.8.25.2

This worked for me at least in the one environment I ran it. We’ll see how it holds up as a solution as I expand to more machines.

Now I am using the following code to avoid the issue:

                Task.Factory.StartNew(() => {
                    sshClient.Dispose();
                });

and kill these threads later.

Though the threads are blocked, the socket resources are in fact freed successfully before _messageListenerCompleted.WaitOne() is called.

I was able to reproduce the problem (thanks @qub1n!). We’re finally getting somewhere.

Sorry, I’m missing the question… I know people but I’m not sure if they can help 😄 What’s needed?

That’s great. Thank you. I will be waiting for this.

@TheAngryByrd I only gave it a quick look, but as far as I can tell that library supports neither SCP nor SFTP. At least for our purposes it is nowhere near a replacement.

I don’t think it’s going to get fixed anytime soon. The bug fix/issue https://github.com/dotnet/corefx/issues/31368 didn’t make it to 3.0 (See: https://github.com/dotnet/corefx/pull/38804#issuecomment-509584230) and the issue has been tagged with https://github.com/dotnet/corefx/milestone/44 (Nov 2020)

For now, I can confirm calling _socket.Shutdown(SocketShutdown.Both) works on High Sierra on Mac.