runtime: IOException running NuGet-Migrations during tests in dotnet CLI first run
This is affecting 6.0, 7.0 and 8.0.
In https://github.com/dotnet/runtime/pull/80510#issuecomment-1382160934 we saw a failure because some NuGet migration code failed to run during the dotnet CLI first run experience:
+ dotnet /datadisks/disk1/work/ADFC09B1/p/xunit/xunit.console.dll JIT/HardwareIntrinsics/JIT.HardwareIntrinsics.XUnitWrapper.dll -parallel collections -nocolor -noshadow -xml testResults.xml -trait TestGroup=JIT.HardwareIntrinsics.Arm.ArmBase
Microsoft.DotNet.XUnitConsoleRunner v2.5.0 (64-bit .NET 7.0.2)
Discovering: JIT.HardwareIntrinsics.XUnitWrapper (method display = ClassAndMethod, method display options = None)
Discovered: JIT.HardwareIntrinsics.XUnitWrapper (found 6 of 362 test cases)
Starting: JIT.HardwareIntrinsics.XUnitWrapper (parallel test collections = on, max threads = 2)
JIT/HardwareIntrinsics/Arm/ArmBase.Arm64/ArmBase.Arm64_ro/ArmBase.Arm64_ro.sh [FAIL]
System.IO.IOException: The system cannot open the device or file specified. : 'NuGet-Migrations'
at System.Threading.Mutex.CreateMutexCore(Boolean initiallyOwned, String name, Boolean& createdNew)
at System.Threading.Mutex..ctor(Boolean initiallyOwned, String name)
at NuGet.Common.Migrations.MigrationRunner.Run()
at Microsoft.DotNet.Configurer.DotnetFirstTimeUseConfigurer.Configure()
at Microsoft.DotNet.Cli.Program.ConfigureDotNetForFirstTimeUse(IFirstTimeUseNoticeSentinel firstTimeUseNoticeSentinel, IAspNetCertificateSentinel aspNetCertificateSentinel, IFileSentinel toolPathSentinel, Boolean isDotnetBeingInvokedFromNativeInstaller, DotnetFirstRunConfiguration dotnetFirstRunConfiguration, IEnvironmentProvider environmentProvider, Dictionary`2 performanceMeasurements)
at Microsoft.DotNet.Cli.Program.ProcessArgs(String[] args, TimeSpan startupTime, ITelemetry telemetryClient)
at Microsoft.DotNet.Cli.Program.Main(String[] args)
Return code: 1
We should look into disabling the first run experience via the DOTNET_SKIP_FIRST_TIME_EXPERIENCE=1 env var, it doesn’t make sense to run that for the tests.
Report
Summary
| 24-Hour Hit Count | 7-Day Hit Count | 1-Month Count |
|---|---|---|
| 0 | 1 | 15 |
Known Issue Error Message
Fill the error message using known issues guidance.
{
"ErrorPattern": "The system cannot open the device or file specified. : ('|')NuGet-Migrations('|')",
"BuildRetry": false,
"ExcludeConsoleLog": false
}
Known issue validation
Build: 🔎 Result validation: ⚠️ Validation could not be done without an Azure DevOps build URL on the issue. Please add it to the “Build: 🔎” line. Validation performed at: 6/28/2023 10:04:55 PM UTC
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 72 (65 by maintainers)
Commits related to this issue
- Try to workaround #80619 — committed to dotnet/runtime by akoeplinger a year ago
- [wasm] Try to work around NuGet-Migrations issue Works around a CI issue where we somehow can't access a named mutex and fails Contributes to https://github.com/dotnet/runtime/issues/80619 — committed to steveisok/runtime by deleted user a year ago
- [wasm] Try to work around NuGet-Migrations issue (#83833) Works around a CI issue where we somehow can't access a named mutex and fails Contributes to https://github.com/dotnet/runtime/issues/8061... — committed to dotnet/runtime by steveisok a year ago
- [wasm] Work around NuGet-Migrations issue Like https://github.com/dotnet/runtime/pull/83833 but also includes wasi jobs. See https://github.com/dotnet/runtime/issues/80619 — committed to dotnet/runtime by akoeplinger a year ago
- [wasm] Work around NuGet-Migrations issue (#83832) Like https://github.com/dotnet/runtime/pull/83833 but also includes wasi jobs. See https://github.com/dotnet/runtime/issues/80619 — committed to dotnet/runtime by akoeplinger a year ago
- Add NuGet-Migrations workaround to runtime tests too See https://github.com/dotnet/runtime/issues/80619 — committed to akoeplinger/runtime by akoeplinger a year ago
- Add NuGet-Migrations workaround to runtime tests too See https://github.com/dotnet/runtime/issues/80619 — committed to akoeplinger/runtime by akoeplinger a year ago
- Add NuGet-Migrations workaround to runtime tests too See https://github.com/dotnet/runtime/issues/80619 — committed to akoeplinger/runtime by akoeplinger a year ago
- Add NuGet-Migrations workaround to runtime tests too (#84129) See https://github.com/dotnet/runtime/issues/80619 — committed to dotnet/runtime by akoeplinger a year ago
- scripts,GitHubCI: prepare for release creation The APK would be uploaded as part of a GitHub release. The bump.fsx is taken from geewallet. The reason to put the sanity check before the builds is tw... — committed to siwatanejo/DG by siwatanejo a year ago
- Fix session-local named mutex compat issue (#90342) - A previous change that was serviced back changed session-local named mutexes to be user-specific by restricting the permissions of the session di... — committed to dotnet/runtime by kouvel a year ago
- Fix session-local named mutex compat issue (#90342) - A previous change that was serviced back changed session-local named mutexes to be user-specific by restricting the permissions of the session di... — committed to tarekgh/runtime by kouvel a year ago
This is most likely an issue with access rights to the /tmp/.dotnet/shm/ subdirectories. I have seen a similar issue with docker recently. The problem happens if a Mutex is created in a login session of a user with privileged access (like root) and not closed either by mistake in the application or due to its crash. Then later on, in case a user with less privileged access logs in and the session id ends up being the same and the user runs nuget, it will attempt to create the session local mutex “NuGet-Migrations”. Such mutex creates a file in /tmp/.dotnet/shm/session{ID} directory where {ID} is the session id. But in this case, the directory already exists, and it is owned by a privileged user. So, the mutex file cannot be created. The session id is the process id of the login shell.
The current plan after discussion is to revert the previous changes that made session-local mutexes user-specific, and offer user-specific mutexes as a new feature in a future release.
@am11 yep unfortunately the nuget migrations code runs before the NoLogo check so we can’t use that one: https://github.com/dotnet/sdk/blob/ec824c9797659318c772330cb176286643a25dfe/src/Cli/Microsoft.DotNet.Configurer/DotnetFirstTimeUseConfigurer.cs#L62-L64
fyi:
DOTNET_SKIP_FIRST_TIME_EXPERIENCE=1lookup was removed in https://github.com/dotnet/cli/commit/7efbf4770630fd370779d222ce9aba35ffe56100 and brought back asDOTNET_NOLOGO=1in https://github.com/dotnet/sdk/commit/a41c9135e5e23440e106930c7ce949ded2ab3d26. Former is unsupported in all active releases of SDK.@akoeplinger @steveisok Workarounds did not resolve the issue.
Yes but the xunit-based runtime tests also use the
IncludeDotNetClioption in sendtohelixhelp.proj, that seems to be the common thing between WBT and runtime tests that e.g. the libraries tests don’t have (afaik we use corerun there).@tte-schleupen https://github.com/dotnet/runtime/issues/80619#issuecomment-1465768181 should work for you.
If you look at the first line of the console log it will say something like this if the test was running in Docker:
In that case you’d need to use the
mcr.microsoft.com/dotnet-buildtools/prereqs:ubuntu-20.04-helix-wasm-amd64image.@kouvel if you need help setting up an environment, feel free to ping me. There’s a test https://dev.azure.com/dnceng-public/public/_build/results?buildId=200752&view=ms.vss-test-web.build-test-results-tab&runId=3764716&resultId=103713&paneView=debug
That fails directly on the VM. It’s easy enough to set up a VM with the same image and number of cores in the helix-repro lab. The image is ubuntu.1804.amd64 and it’s a 2 core D_v4 (D2a_v4 I believe).
So far, it’s not looking like that’s the issue. I got a repro in the CI with some logging and it looks like before the failing test runs, /tmp/.dotnet/ doesn’t exist, and /tmp/ appears to have the expected permissions.
However, it is possible that the current issue is caused by some other reason. If the issue is easily reproducible in the ci, it would make sense to create a testing PR with added listing of the directory tree and their permissions / users under the /tmp/.dotnet/shm and dumping the current session id. That would reveal if the issue is what I’ve mentioned or something different.
I think we’d want to put this setting into the HelixPreCommands command list so it runs before any of our tests try to execute dotnet. I think putting it into the test wrappers themselves is too late.
I think I can probably imagine some logic that would trigger this before running the actual tests, perhaps as a prolog of the generated legacy XUnit wrappers (for non-merged tests) and the new-style Roslyn-generated full wrappers for the merged tests but it’s unclear to me whether it would fix the problem or just make it less frequent. If it’s the latter I think we should rather try to get to the bottom of why this step is failing on the Helix machines and / or try to find consensus regarding its suppression for the Helix tests. It also doesn’t answer the concern raised by @lewing, namely that removal of the
DOTNET_SKIP_FIRST_TIME_EXPERIENCEflag apparently wasn’t fully synchronized with other repos that still set it in their scripts so we still need to figure out whether we can bring it back or whether we need to audit all places that are using it and somehow bring them in sync with its removal; /cc @jkoritzinsky for additional thoughts.This one is interesting but it’s not the root thing, it just reacts on the fact that the variable isn’t used anywhere:
https://github.com/dotnet/sdk/commit/27b6661ce31434055287d802929c5ba61c82ddb4
I continue looking what gutted this behavior.