vstest: CollectDumpOnTestSessionHang doesn't produce a dump file

Description

I’m trying to troubleshoot hanging builds on a CI server. I found this which seems very promising:

https://github.com/microsoft/vstest-docs/blob/master/RFCs/0028-BlameCollector-Hang-Detection.md

However, when I use the hang detector, I don’t get a dump file.

Steps to reproduce

The test hangs are intermittent, so they are hard to reproduce.

dotnet vstest is invoked with:

<lots of DLLs> --Parallel --logger:"trx;LogFileName=NUnitTestsCore.trx" --logger:"console;verbosity=minimal" --ResultsDirectory:.../build/test-reports --Settings:...\tmpCF7A.tmp

The settings file is auto generated and contains something like this:

<RunSettings>
  <RunConfiguration>
    <MaxCpuCount>4</MaxCpuCount>
  </RunConfiguration>
  <DataCollectionRunSettings>
    <DataCollectors>
      <DataCollector friendlyName="blame" enabled="True">
        <Configuration>
          <ResultsDirectory>...\build</ResultsDirectory>
      	  <CollectDumpOnTestSessionHang TestTimeout="120000" DumpType="full"/>
        </Configuration>
      </DataCollector>
    </DataCollectors>
  </DataCollectionRunSettings>    
</RunSettings>

Expected behavior

I expect the hang detector to detect a hang and produce a crash dump file.

Actual behavior

The hang detector did detect a hang after ~2 minutes:

The active test run was aborted. Reason: Test host process crashed
...
Test Run Aborted.
Attachments:
  ...\build\test-reports\4a680b77-23cd-471a-9b82-ead6630865fa\Sequence_af08f6cfd55f4dd5989add68f10ea91f.xml

However, it only produces a sequence file, not a crash dump.

Note that the sequence file ends up in the result directory used on the command line, rather than the results directory in the settings file.

Diagnostic logs

None produced by the above command.

Environment

Windows Server 2012 .NET Core version 3.0.100

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 22 (11 by maintainers)

Most upvoted comments

Problem solved, the part about 4.6 targeting pack was a red herring. Installing VS extension development support in VS2017 did it.

Actually there is a lot. In the latest net5.0 release (I think since preview6). We are leveraging the Diagnostics NetCore client to create hang dumps. This works on Windows (with any target framework) and Linux (with netcoreapp3.1 and newer). There is no need for procdump.exe when creating hang dumps, or for the temporary folder.

To trigger a hang dump you can now simply do: dotnet test --blame-hang-timeout 2min or vstest.console /Blame:"CollectHangDump;TestTimeout=2min".

For crash dumps the situation is similar as before, but it errors out a bit better. There you still need procdump, because that flow needs to attach to a running process and detect failure, which is no easy task. But luckily crash dumps are usually way less interesting than hang dumps, because when the process crashes it often has an eay to see reason.

From dotnet test help:

  --blame                                  Runs the tests in blame mode. This option is helpful in isolating problematic tests that cause the test host to crash or hang.
                                           When a crash is detected, it creates an sequence file in TestResults/guid/guid_Sequence.xml that captures the order of tests that were run before the crash.
                                           Based on the additional settings, hang dump or crash dump can also be collected.
                                           Example:
                                           Timeout the test run when test takes more than the default timeout of 1 hour, and collect crash dump when the test host exits unexpectedly.
                                           (Crash dumps require additional setup, see below.)
                                           dotnet test --blame-hang --blame-crash
                                           Example:
                                           Timeout the test run when a test takes more than 20 minutes and collect hang dump.
                                           dotnet test --blame-hang-timeout 20min
  --blame-crash                            Runs the tests in blame mode and enables collecting crash dump when testhost exits unexpectedly.
                                           This option is currently only supported on Windows, and requires procdump.exe and procdump64.exe to be available in PATH.
                                           Or PROCDUMP_PATH environment variable to be set, and point to a directory that contains procdump.exe and procdump64.exe.
                                           The tools can be downloaded here: https://docs.microsoft.com/en-us/sysinternals/downloads/procdump
                                           Implies --blame.
  --blame-crash-dump-type <DUMP_TYPE>      The type of crash dump to be collected. Implies --blame-crash.
  --blame-crash-collect-always             Enables collecting crash dump on expected as well as unexpected testhost exit.
  --blame-hang                             Run the tests in blame mode and enables collecting hang dump when test exceeds the given timeout. Implies --blame-hang.
  --blame-hang-dump-type <DUMP_TYPE>       The type of crash dump to be collected. When None, is used then test host is terminated on timeout, but no dump is collected. Implies --blame-hang.
  --blame-hang-timeout <TIMESPAN>          Per-test timeout, after which hang dump is triggered and the testhost process is terminated.
                                           The timeout value is specified in the following format: 1.5h / 90m / 5400s / 5400000ms. When no unit is used (e.g. 5400000), the value is assumed to be in milliseconds.
                                           When used together with data driven tests, the timeout behavior depends on the test adapter used. For xUnit and NUnit the timeout is renewed after every test case,
                                           For MSTest, the timeout is used for all testcases.
                                           This option is currently supported only on Windows together with netcoreapp2.1 and newer. And on Linux with netcoreapp3.1 and newer. OSX and UWP are not supported.

We are all actually using 2019, sorry. I am sure you need at least these workloads. The Visual Studio Extension development should be optional if you skip the vsix generating step in the script, see below.

MicrosoftTeams-image (2)

And then from the individual components you’d need the Portable Pack and .NET 4.5.1. image

I almost never run all acceptance tests locally. You should be good to go with just unit tests or at best smoke tests.

I did see the same issues (and more) when joining this project. And never got to go back and update the installation guide. Sorry about that. I will changing our release pipeline a lot, and imho you don’t need to build the vsix locally in most cases. You can comment out these steps in the build.ps1 and it should still build. If you need more help ping me on twitter or here, I can spend 15 minutes showing you stuff. 😃

image

I think you need VS enterprise, some of these dlls are only shipped on the enterprise version like “Microsoft.VisualStudio.CodeCoverage.Shim”.

Hehe these are pretty outdated, go ahead with only the UTs locally. The acceptance and smoke tests will get validated on the CI. Plus I don’t think blame data collector has any E2E tests.

I’m not trying to be annoying here, just wondering if you (maintainers/collaborators) are seeing these test errors as well?

@provegard please do. You can tag me to help out with the review. It was a pet project of mine but I never got round to polishing it, will help in any way I can.