vstest: 'dotnet test' sometimes causes terminal to stop echoing after completion (ubuntu)

From @beevik on Saturday, December 21, 2019 7:20:52 PM

When running dotnet test on ubuntu, the terminal often stops echoing input after the command completes. It’s not 100%. Sometimes echoing remains on. I’m using the xunit testing framework.

Here are the results of typing stty -a after the terminal stops echoing:

speed 38400 baud; rows 40; columns 132; line = 0;
intr = ^C; quit = ^\; erase = ^?; kill = ^U; eof = ^D; eol = <undef>; eol2 = <undef>; swtch = <undef>; start = ^Q; stop = ^S;
susp = ^Z; rprnt = ^R; werase = ^W; lnext = ^V; discard = ^O; min = 1; time = 0;
-parenb -parodd -cmspar cs8 -hupcl -cstopb cread -clocal -crtscts
-ignbrk -brkint -ignpar -parmrk -inpck -istrip -inlcr -igncr icrnl -ixon -ixoff -iuclc -ixany -imaxbel iutf8
opost -olcuc -ocrnl onlcr -onocr -onlret -ofill -ofdel nl0 cr0 tab0 bs0 vt0 ff0
isig -icanon -iexten -echo echoe echok -echonl -noflsh -xcase -tostop -echoprt echoctl echoke -flusho -extproc

As you can see, echo is off (-echo).

This does not appear to be a bug in xunit, as someone has reported a similar issue when using expecto instead.

Ubuntu 18.04.3 .NET Core 3.1 xunit 2.4.1

Copied from original issue: dotnet/sdk#4090

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 37
  • Comments: 84 (30 by maintainers)

Commits related to this issue

Most upvoted comments

From @rmunn on Monday, December 23, 2019 8:18:21 PM

I’m the one who reported that issue with Expecto. I also have a vague memory of seeing this pop up with other dotnet foo commands such as dotnet build, but I don’t have a solid example since I’m away from my desk for the holidays. Once I get a solid example I’ll mention it here.

BTW, for anyone else who experiences this, you can restore your terminal to normal functionality after this issue triggers by typing stty echo followed by Enter.

I just ran into this. Definitely annoying! After reading through this and related issues, I tried to determine minimal reproducible steps.

Environment

  • Windows 10
  • WSL with Ubuntu 18.04
  • .NET Core SDK version 3.1.100 in WSL
  • Using both the Ubuntu and the Windows Terminal application (reproduced in both)

Findings

  • I cannot reproduce when running dotnet test from within a single test application’s directory.
  • I can regularly reproduce when running dotnet test from a directory with a solution file and multiple projects (mix of test and non-test projects, all added to the .sln file).
  • When trying to debug output (via something like piping to sed, e.g. dotnet test | sed -n 'l'), I’m not seeing any immediately incriminating control characters.
  • I have, however, noticed output lines being interleaved or being outputted in different order, leading me to a hunch that there’s some manner of race condition with terminal output causing the symptoms.

Reproduction

From the command line, I narrowed everything down to being able to run the following commands to reproduce the issue fairly reliably:

mkdir TestTerminalIssues
cd TestTerminalIssues
dotnet new xunit --name Project1
dotnet new xunit --name Project2
dotnet new classlib --name Project3
dotnet new sln
dotnet sln add Project1 Project2 Project3
dotnet build
dotnet test

Run dotnet test until the terminal stops behaving as expected. It usually only takes me two or three attempts before the terminal “breaks.”

Will try to dig in further, but I hope this helps, in the meantime.

This should be fixed by https://github.com/dotnet/msbuild/pull/6053, which will be in MSBuild v16.10.

Temporary work-around in bash syntax: BEFORE=$(stty -g); dotnet test --logger:"console;verbosity=detailed"; stty "${BEFORE}"

or in .bashrc:

dotnet ()
{
    local RET_VAL;
    local BEFORE=$(stty -g);
    $(which dotnet) "$@";
    RET_VAL=$?;
    sleep 0.33;
    stty "${BEFORE}";
    return $RET_VAL
}

But I will just try to notify persons who commited changes related to termios.h in .NET Runtime to maybe clarify things.

@stephentoub @tmds could you take a look on this issue? In a case of tl;dr I believe the most accurate are investigations since https://github.com/microsoft/vstest/issues/2282#issuecomment-609494059

When invoking dotnet test, this leads to MsBuild processes being started:

["/home/tmds/Downloads/dotnet-50-p2/dotnet", "/home/tmds/Downloads/dotnet-50-p2/sdk/5.0.100-preview.2.20176.6/MSBuild.dll", "/home/tmds/Downloads/dotnet-50-p2/sdk/5.0.100-preview.2.20176.6/MSBuild.dll", "/nologo", "/nodemode:1", "/nodeReuse:false"]

No one waits for this process to terminate. When such a process terminates after dotnet test it restores terminal settings to what it observed at start, which may be echo off.

I think the resolution is to either wait for these processes to terminate, or to not give them access to terminal by redirecting their streams.

Looking through the pal_console.c code, I noticed some interesting code interactions:

  • The ConfigureTerminal function unsets the ECHO flag if its forChild parameter is false. (If forChild is true, the ECHO flag is untouched).
  • There’s also a SystemNative_InitializeConsoleBeforeRead function, which calls ConfigureTerminal with forChild=false.
  • In ConsolePal.Unix.cs, the TryGetCursorPosition method tries to return a cached value, but if it can’t, it uses the appropriate terminal escape sequence to query the current cursor position. This is a complicated process because the terminal sends back its answer by injecting characters into the terminal’s input buffer, so TryGetCursorPosition calls SystemNative_InitializeConsoleBeforeRead to make sure that the escape characters it’s about to receive aren’t echoed to the screen.
  • In ConsolePal.Unix.cs, the CursorLeft and CursorTop properties call TryGetCursorPosition, which means that the first time they’re called, the ECHO flag on the terminal will end up unset.
  • The ProgressIndicator class in vstest uses CursorLeft and CursorTop to display its progress indicator (three dots that keep cycling through ., .., and ... states).
  • In Process.Unix.cs, an int counter is used to keep track of how many child processes are using the terminal. When it’s non-zero, ConfigureTerminal is called with forChild=true. When it hits zero, ConfigureTerminal is called with forChild=false.
  • Also in Process.Unix.cs, the EnsureInitialized method calls InitializeTerminalAndSignalHandling() which eventually calls into pal_console.c; what it does is save the current terminal settings (acquired at the moment InitializeTerminalAndSignalHandling is called) in a global variable, then register an atexit() handler to restore those terminal settings when the process exits.

So what I think might be happening could be something like this interaction between all that code:

  • The dotnet test process starts up, and saves the current state of the terminal (including whether the echo flag is on or off).
  • In some test runners, such as xunit or Expecto, tests run in parallel by default. If so, multiple threads are probably created for this.
  • If the various unit tests running in those threads use Process.Start or similar calls, multiple child processes might be created that each are saving and restoring the terminal state.
  • If the order of events goes like this, the bug might be triggered:
    1. dotnet test starts up
    2. Several child processes are started to run various tests in parallel
    3. The ProgressIndicator class is initialized and starts writing its progress marker dots to the screen, one per second
    4. At least one child process finishes just before ProgressIndicator writes its third dot to the screen
    5. ProgressIndicator writes its third dot, which causes it to want to remove three dots. It accesses CursorLeft in order to do so.
    6. This causes TryGetCursorPosition to set echo=false in the terminal settings
    7. Another child process starts up very soon afterwards, and it gets echo=false in the terminal settings that it stores in its global variable
    8. When that child process exits, it will restore the terminal to echo=false, so any subsequent child processes that start after that point will pick up echo=false as the settings they should restore
    9. If the last child process that exits was one that had picked up echo=false, then it restores those terminal settings right before dotnet test exits, and so the user is left with a terminal that doesn’t echo.

A complicated race condition, but all it requires is that some child processes complete before three seconds are up, which is not hard to have happen.

@BDisp

I confirm that this still happens on net6.0.

Can you create an issue for the font size change? As I mentioned in https://github.com/microsoft/vstest/issues/2282#issuecomment-933489651, it is not related to the echoing.

@andreujuanc

Am I supposed to migrate all my services to .net 5 (or 6)? Because this bug was never fixed?

You anyhow need to port your app in the coming year to .NET 6+ to remain on a supported version.

Such more for “LTS”

LTS means you get security fixes. Microsoft rarely backports other fixes to minimize the chance of regressions.

@Jodacola - Running the Microsoft ® Test Execution Command Line Tool Version 16.5.0 under Ubuntu (NOT the Ubuntu WSL environment), I was able to reproduce this issue again, by cloning https://github.com/commandlineparser/commandline/ and running dotnet test a few times. The third run triggered the issue: my typed input stopped being echoed to the terminal until I ran stty echo to turn the echo flag back on.

Again, that was with version 16.5.0 of the text execution tool, running under .NET Core SDK 3.1.201.

After execute a ‘dotnet test’ on WSL2 the font size decreases. Is also related?

@BDisp This focuses on the terminal echo issue. If you can reproduce the font size issue using .NET 6, create a new ticket for it.

@Forgind I think we can close this and consider it fixed .NET6+.

The solution on vstest side is what I put in https://github.com/microsoft/vstest/issues/2282#issuecomment-614620392:

the resolution is to either wait for these processes to terminate, or to not give them access to terminal by redirecting their streams.

This change for .NET 6: https://github.com/dotnet/runtime/pull/42303 should also make it less likely terminal ends up non-echoing after running dotnet test.

dotnet test /m:1 does not exhibit the same problem as dotnet test for me

any updates on this? It’s just a huge pain to have a broken console everytime I run my test suite. I now instinctively write stty echo after make test. This is affecting multiple developers in my org.

@nohwnd @Jodacola have you tried play-around with “script” command?

I did not, I am not that familiar with fixing issues on linux terminal 🙂 Thanks for the suggestion.

@rmunn wow! 👏👏👏

That seems very plausible and very close to what we’ve been observing. To aid you with your investitgation, what happens process-wise in the repro above is that:

  • dotnet test kicks off build for the two test projects.

  • build builds the projects in parallel and then starts two vstest.console processes in parallel

  • ( each vstest.console starts a test host, testhost only writes back through socket, but not to console, so forget about it )

  • in each vstest.console a timer starts that will write progress indicator to console after 1s

  • in each console we start getting back output from testhost and writing it to screen.

  • at this point we have 4 threads in 2 processes that are trying to write to the console

  • we lock on most of the output, and Console imho does as well, so we can only do 1 console action per process at 1 time.


I totally believe that what you described is at least part of the answer, but can you share your opinion on these?

  • v. could be caused by clearing the progress indicator from the screen after writing test results rather than 3rd dot? I am asking because the test summary almost always stopped the execution in the same way. There are different processes running, so I don’t believe the timing was the same. I did no see anything special about the summary. I am guessing that there is a call to Clear on the ProgressIndicator when we are done running, that makes this easily reproducible? I saw it happen more consistently with xunit because it writes into console way more than mstest, and in smaller chunks.

  • in the repro I did not see it trigger when colors were disabled. In the repro where both sides fail it locks up silently, but with the long running test passing (in my repro the assertion is already commented out, so it behaves this way by default), outputs the same characters everytime. Can you explain that, and what the collision is?

@andreujuanc, @tmds fixed this with https://github.com/dotnet/msbuild/pull/6053. That came out in 16.10 if I remember correctly, so you would have to update .NET.

I agree, but I don’t have permission to close it. @nohwnd?

Really, using dotnet 5 the problem does not occur to me anymore 😃

@rainersigwald when I run the steps from https://github.com/microsoft/vstest/issues/2282#issuecomment-605665646:

mkdir TestTerminalIssues
cd TestTerminalIssues
dotnet new xunit --name Project1
dotnet new xunit --name Project2
dotnet new classlib --name Project3
dotnet new sln
dotnet sln add Project1 Project2 Project3
dotnet build
dotnet test

These dotnet processes are started:

dotnet           7907   6383     0 /usr/bin/dotnet test
dotnet           7920   7907     0 /usr/lib64/dotnet/dotnet exec /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll -maxcpucount -verbosity:m -restore -target:VSTest -nodereuse:false -nologo
dotnet           7943   7920     0 /usr/lib64/dotnet/dotnet /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /nologo /nodemode:1 /nodeReuse:false /low:false
dotnet           7961   7920     0 /usr/lib64/dotnet/dotnet /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /nologo /nodemode:1 /nodeReuse:false /low:false
dotnet           8007   7943     0 /usr/lib64/dotnet/dotnet exec /usr/lib64/dotnet/sdk/5.0.100/vstest.console.dll --testAdapterPath:/home/tmds/.nuget/packages/coverlet.collector/1.3.0/build/netstandard1.0/ --framework:.NETCoreApp,Version=v5.0 /tmp/TestTerminalIssues/Project2/bin/Debug/net5.0/Project2.dll
dotnet           8020   7920     0 /usr/lib64/dotnet/dotnet exec /usr/lib64/dotnet/sdk/5.0.100/vstest.console.dll --testAdapterPath:/home/tmds/.nuget/packages/coverlet.collector/1.3.0/build/netstandard1.0/ --framework:.NETCoreApp,Version=v5.0 /tmp/TestTerminalIssues/Project1/bin/Debug/net5.0/Project1.dll
dotnet           8043   8007     0 /usr/lib64/dotnet/dotnet exec --runtimeconfig /tmp/TestTerminalIssues/Project2/bin/Debug/net5.0/Project2.runtimeconfig.json --depsfile /tmp/TestTerminalIssues/Project2/bin/Debug/net5.0/Project2.deps.json /home/tmds/.nuget/packages/microsoft.testplatform.testhost/16.7.1/lib/netcoreapp2.1/testhost.dll --port 42619 --endpoint 127.0.0.1:042619 --role client --parentprocessid 8007 --telemetryoptedin false
dotnet           8046   8020     0 /usr/lib64/dotnet/dotnet exec --runtimeconfig /tmp/TestTerminalIssues/Project1/bin/Debug/net5.0/Project1.runtimeconfig.json --depsfile /tmp/TestTerminalIssues/Project1/bin/Debug/net5.0/Project1.deps.json /home/tmds/.nuget/packages/microsoft.testplatform.testhost/16.7.1/lib/netcoreapp2.1/testhost.dll --port 40609 --endpoint 127.0.0.1:040609 --role client --parentprocessid 8020 --telemetryoptedin false

I see the processes dotnet /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll /nologo /nodemode:1 /nodeReuse:false /low:false are not waited for by their parent: dotnet exec /usr/lib64/dotnet/sdk/5.0.100/MSBuild.dll -maxcpucount -verbosity:m -restore -target:VSTest -nodereuse:false -nologo.

Do you know what creates these processes? I think making the parent Process.WaitForExit its children is the proper fix for this issue.

Perhaps this is a stupid question, but what does /m:1 mean, and why am I not able to answer that question with a quick google?

This had me stumped for a while too @WellspringCS . As @chrisheil suggests its an option that’s passed through to MSBuild to specify the maximum number of concurrent processes used. It’s documented under ‘maxCpuCount’ here https://docs.microsoft.com/en-us/visualstudio/msbuild/msbuild-command-line-reference?view=vs-2019

@WellspringCS based on ths link, i think it might be an MSBuild param to run tests sequentially instead of in parallel

https://github.com/coverlet-coverage/coverlet/blob/master/Documentation/Examples/MSBuild/MergeWith/HowTo.md

@nohwnd I think we should go with minimal changes to current behaviour and running for individual project and running same project from solution should be almost identical. The issue is that for example coverlet is executing additional steps before tests run and after tests run (for individual project). I think more about approach that all individual project runs will report tests status and attachements to some orchestrator (through port similar like in VS).

@tmds thanks for analyzing this.

@jakubch1 do you plan to solve the CC merge issue by using the msbuild to only build and figure out source paths, and then run multiple test hosts from single vsconsole? I think it would be the best way to go because that would help us here with the echoing, as well as with the parallel output, and with other issue where you can’t run multiple different TFMs from single console.

To behave similar to Windows, .NET Core disables echoing when Console is used for input in some way on Linux. When a .NET Core app exits, it will restore the echoing state it captured at start.

Does dotnet test wait for all it’s children to terminate (especially the children which are given access to terminal by not redirecting streams)?

I’ll try to reproduce this week using instructions from https://github.com/microsoft/vstest/issues/2282#issuecomment-605665646.

What is going on with that Travis CI log? I’m having a hard time telling where the VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=1 dotnet test run starts and the other run ends, because it looks a bit interleaved. Does the same failure happen if you only run the VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=1 dotnet test version, and not the other two?

AFAICT, colors should have no effect on this bug, so I have no idea why the VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=0 run succeeds (doesn’t trigger the hang) but the VSTEST_RUNNER_PROGRESS=1 VSTEST_RUNNER_COLOR=1 run hangs. Where is the code that you wrote that handles the VSTEST_RUNNER_COLOR environment variable? I looked all over the branches of your ProgressToy repo and couldn’t find it, so I can’t tell whether you’re accidentally changing something besides color when VSTEST_RUNNER_COLOR is equal to 1.

And my availability is likely to be almost zero in the next few weeks, but I might occasionally have a spare hour to throw at this.

Pushed original repro to the travis build and it got stuck. Will try to replace the progress with another that does not move the cursor, hopefully that will fix it. Then we can talk about how it looks.

image