vstest: "Test host process crashed" error hard to diagnose

Our CI has an intermittent failure:

The active test run was aborted. Reason: Test host process crashed
Results File: C:\a\_temp\AzDevOps_2019-6vse00024V_2021-06-24_02_47_47.trx
Test Run Aborted.

As far as I can tell, the message is printed by this code: https://github.com/microsoft/vstest/blob/eff66c00b217b355b6ee11034ff8396a618d04e3/src/Microsoft.TestPlatform.CommunicationUtilities/TestRequestSender.cs#L667

It would be really helpful here if it printed the full path to the process executable that crashed. I took a quick look but I couldn’t find where to get the executable full path from. Also it seems that the error output stream contents was empty (clientExitErrorMessage) so when the test process crashes it should print the full exception stack to the Console.Error so that the stack appears in the CI log.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 10
  • Comments: 48 (20 by maintainers)

Commits related to this issue

Most upvoted comments

I have found out what my problem was. It’s that infamous 100ms bug striking again, I can’t even fathom how much that bug has caused our ecosystem in terms of productivity loss. I personally wasted four full days tracking this down.

I had to add --diag to my dotnet test arguments to publish the diagnostic logs, and then a separate step to upload them as artifacts.

Finally I saw this:

TpTrace Error: 0 : 6860, 7, 2021/06/26, 19:34:34.416, 8580065582, vstest.console.dll, LengthPrefixCommunicationChannel.Send: Error sending data: System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.IO.BufferedStream.Flush()
   at System.IO.BinaryWriter.Flush()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.Send(String data).
TpTrace Warning: 0 : 6860, 7, 2021/06/26, 19:34:34.417, 8580070372, vstest.console.dll, ProxyOperationManager: Failed to end session: Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.Interfaces.CommunicationException: Unable to send data over channel.
 ---> System.IO.IOException: Unable to write data to the transport connection: An existing connection was forcibly closed by the remote host..
 ---> System.Net.Sockets.SocketException (10054): An existing connection was forcibly closed by the remote host.
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   --- End of inner exception stack trace ---
   at System.Net.Sockets.NetworkStream.Write(Byte[] buffer, Int32 offset, Int32 size)
   at System.IO.BufferedStream.Flush()
   at System.IO.BinaryWriter.Flush()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.Send(String data)
   --- End of inner exception stack trace ---
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.Send(String data)
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TestRequestSender.EndSession()
   at Microsoft.VisualStudio.TestPlatform.CrossPlatEngine.Client.ProxyOperationManager.Close()
TpTrace Warning: 0 : 6860, 7, 2021/06/26, 19:34:34.417, 8580071127, vstest.console.dll, ProxyOperationManager: Timed out waiting for test host to exit. Will terminate process.

I think it’s this issue: https://github.com/microsoft/vstest/issues/2379

https://github.com/microsoft/vstest/blob/d10bcbb28cc3999bcc12758a41a04b998eb9595b/src/Microsoft.TestPlatform.CrossPlatEngine/Client/ProxyOperationManager.cs#L211-L215

The process is busy doing something, then we impatiently kill it after 100ms, AND WE DON’T TELL THE USER WE KILLED IT. All the user sees is:

The active test run was aborted. Reason: Test host process crashed

So then the user is sent on a wild goose chase for 4 days trying to figure out various ways to deploy procdump.exe to the CI agent, find out that it doesn’t work anyway because we only explicitly pass StackOverflowException and AccessViolationException as arguments to procdump and ignore other types of exceptions.

We killed the process, we know it, but we don’t log it, don’t publish the dump, don’t have any MSBuild errors, leaving the user helpless, frustrated, blocked, not knowing how to proceed.

Hi all - I was just hoping if someone could either help or summarize the state of this issue.

I’ll start with whats brought me here.

We’ve recently started to migrate to running on linux starting running our unit tests dlls using dotnet test on a linux host.

When running on windows the test past with no issue. On Linux we’d hit this mysterious issue saying the ‘Test host process crashed’. Looking into the details logs from our test dll - there were no obvious issues… it just stops.

So we followed some of the steps here added --blame added diagnostic logging etc. We isolated it to one test (that again passed fine on windows). I combed the diagnostic log files that are produced

  • diag.datacollector
  • diag.host *diag

and all I found was

TpTrace Verbose: 0 : 10226, 10, 2023/10/24, 07:55:32.658, 635444445327132, vstest.console.dll, TcpClientExtensions.MessageLoopAsync: NotifyDataAvailable remoteEndPoint: 127.0.0.1:33932 localEndPoint: 127.0.0.1:34540
TpTrace Error: 0 : 10226, 10, 2023/10/24, 07:55:32.659, 635444446846388, vstest.console.dll, Socket: Message loop: failed to receive message due to socket error System.IO.EndOfStreamException: Unable to read beyond the end of the stream.
   at System.IO.BinaryReader.ReadByte()
   at System.IO.BinaryReader.Read7BitEncodedInt()
   at System.IO.BinaryReader.ReadString()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.LengthPrefixCommunicationChannel.NotifyDataAvailable()
   at Microsoft.VisualStudio.TestPlatform.CommunicationUtilities.TcpClientExtensions.MessageLoopAsync(TcpClient client, ICommunicationChannel channel, Action`1 errorHandler, CancellationToken cancellationToken), remoteEndPoint: 127.0.0.1:33932 localEndPoint: 127.0.0.1:34540

that exception is repeated a few times. From some searching this appears the the error you get when the test host unexpectedly terminates… but what I can’t find is why…

I see near the start of this issue someone mentions that the above stack trace is useless and that should be improved https://github.com/microsoft/vstest/issues/2952#issuecomment-1065479076

However I’m not seeing any real mention of a ‘problem’ after that. It seems it gets into weeds about adding diag logs etc… but I’ve done all that and it sheds no light.

Jump ahead a few wasted days… we found a null ref exception in the test. Its handled and causes no issues on windows. However if we prevent that exception being thrown… the tests run fine on linux… no crashed process.

My theory atm is that for some reason that exception is causing some kind of ‘freeze’ in the process and then its getting terminated because…???

Anyway I’ve chatted quite enough… can anyone help us with how could go about diagnosing such issues in the future.

(I can share full logs privately, but can’t share them via github)

I can see it fails in xUnit.GetAvailableRunnerReporters because it cannot load a module. I can see the name of the module. Sent you an email.

Quick update from my side: Looking at the procdump files, it looked like an issue with a specific DLL file in my build folder. After letting it rest for a while and then updating all packages at some point (including xunit IIRC), the problem was completely gone. In hindsight the process just crashing, and the dump not really helping out either, was really frustrating. Nevertheless, thank you for the support @nohwnd!

It is vstest.console.exe /Blame:CollectDump, ideally use it together with --diag:logs\log.txt reading the datacollector log makes debugging issues much easier.

Yes, it’s the build I mentioned in chat.

Things should work regardless of whether I’m using dotnet test or the VSTest yaml task. Where are the docs for the VSTest task? Where are these screenshots from? What is the yaml syntax if I wanted to switch my pipeline from dotnet test to VSTest?

dotnet test experience should be just as well supported as the VSTest task.