runtime: Performance workitems hang when trying to kill build servers at shutdown

We’ve seen workitems that hang trying to kill compiler servers on shutdown, and given that the workitems timeout is 4 hours, PRs just sit waiting forever and also clogging the queues.

These workitems just sit running the following command:

[2020/06/18 18:41:23][INFO] $ dotnet build-server shutdown
[2020/06/18 18:41:23][INFO] Shutting down MSBuild server...
[2020/06/18 18:41:23][INFO] Shutting down VB/C# compiler server...
[2020/06/18 18:41:23][INFO] VB/C# compiler server shut down successfully.

Maybe it is a dotnet build-server issue.

cc: @dotnet/runtime-infrastructure @DrewScoggins @billwert @adamsitnik

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 2
  • Comments: 50 (49 by maintainers)

Most upvoted comments

At this point we’re going to stop. We enabled this leg optimistically, not having had an actual problem but because it seemed like a good idea. Since it has proven so problematic (and for little real gain) it’s not worth continuing to try and fix. Should we wind up with a huge influx of issues this would have caught we will revisit it.

Thanks. Once we have the logs, if the job is unstable we should disable it until we have a fix.

Just merged the logging fix.

I’m guessing it got as far as calling into MSBuild here https://github.com/dotnet/sdk/blob/b1223209644d900702287faea8e9b71f95ec49f8/src/Cli/dotnet/BuildServer/MSBuildServer.cs#L18 which ultimately to connect to all dotnet processes in turn, with 2x 30 sec timeout on each https://github.com/microsoft/msbuild/blob/93fec27d7168675a369729446ad96aaaaa84137f/src/Build/BackEnd/Components/Communications/NodeProviderOutOfProcBase.cs#L126 but I would expect it to fail immediately unless the node was MSBuild. If it connects, then it tries to read.

But, who knows what is going on – to investigate, you should

set MSBUILDEBUGCOMM=1
set MSBUILDDEBUGPATH=<some path to put the trace>

This will immediately show what it is doing.