runtime: Mutex.TryOpenExisting intermittently throws IOException
Description
After introducing .NET 7 rc1 SDK into Runtime CI we have started seeing intermittent exceptions System.IO.IOException: Connection timed out : 'Global\msbuild-server-launch-{45_random-chars}'
in Runtime and Arcade on Linux CI agents or docker based Linux builds.
Reproduction Steps
I was trying hard to create minimal repro - without success. I believe easiest repro would be to rerun some of our CI’s where it was seen:
Expected behavior
new Mutex(initiallyOwned: true, name: "Global\UniqueName", out bool createdNew)
and Mutex.TryOpenExisting
shall never intermitently throw IOException.
Actual behavior
new Mutex(initiallyOwned: true, name: "Global\UniqueName", out bool createdNew)
and Mutex.TryOpenExisting
sometimes throws:
System.IO.IOException: Connection timed out : 'Global\msbuild-server-launch-{45_random-chars}'
at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
at Microsoft.Build.Experimental.MSBuildClient.TryLaunchServer()
at Microsoft.Build.Experimental.MSBuildClient.Execute(CancellationToken cancellationToken)
at Microsoft.Build.CommandLine.MSBuildClientApp.Execute(String[] commandLine, String msbuildLocation, CancellationToken cancellationToken)
at Microsoft.Build.CommandLine.MSBuildApp.Main(String[] args)
at Microsoft.DotNet.Cli.Utils.MSBuildForwardingAppWithoutLogging.ExecuteInProc(String[] arguments)
Regression?
Unknown
Known Workarounds
Unknown.
Configuration
I have seen this mostly on:
- OSX_x64
- Linux_musl_x64
- Linux_x64
Other information
No response
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 15 (15 by maintainers)
Commits related to this issue
- Correct error messages for CoreCLR Win32 PAL uses in CoreLib Contributes to #76736 — committed to jkotas/runtime by jkotas 2 years ago
- Correct error messages for CoreCLR Win32 PAL uses in CoreLib Contributes to #76736 — committed to dotnet/runtime by jkotas 2 years ago
- Correct error messages for CoreCLR Win32 PAL uses in CoreLib (#76813) Contributes to #76736 Co-authored-by: Jan Kotas <jkotas@microsoft.com> — committed to dotnet/runtime by github-actions[bot] 2 years ago
- Correct error messages for CoreCLR Win32 PAL uses in CoreLib (#76768) Contributes to #76736 — committed to dotnet/runtime by jkotas 2 years ago
I guess nobody noticed that the changes in #70685 have unintended interaction with Win32 emulator PAL uses in CoreLib. It is very hard to keep in mind at all times that a few parts of the CoreLib use the Win32 emulator PAL.
Kusto shows that this issue has happened about 50 times during the last 30 days and it always occurs in the mono wasm legs. It seems the reason it occurs there is that for wasm, the compilation of each test happens during run of the test (when its generated .sh script is called).
The error is
ERROR_OPEN_FAILED
from Win32 PAL.ERROR_OPEN_FAILED
code is 110, the Linux message for error code 110 is “Connection timed out”.@kouvel This was thrown from here: https://github.com/dotnet/runtime/blob/main/src/libraries/System.Private.CoreLib/src/System/Threading/Mutex.Windows.cs#L35 .
Note that the “Connection timed out” message may be bogus. This code is mixing and matching Windows and Unix error codes.
The error comes from msbuild. msbuild always runs on CoreCLR. This is not Mono problem.