sdk: Inconsistent NuGet push order in .NET release publishing
Describe the bug
When new .NET service release is pushed a lot of packages are pushed to the NuGet feed. Among those packages are the workload manifests and the workload packages referenced from those manifests. There is currently no enforced order in which these assets are uploaded which can cause the manifest feed to refer to non-existent packages. This will cause temporary failures on dotnet workload install command.
Example:
Run dotnet workload install macos ios
dotnet workload install macos ios
shell: /bin/bash -e {0}
env:
DOTNET_ROOT: /Users/teamcity/.dotnet
Skip NuGet package signing validation. NuGet signing validation is not available on Linux or macOS https://aka.ms/workloadskippackagevalidation .
Updated advertising manifest microsoft.net.sdk.tvos.
Updated advertising manifest microsoft.net.sdk.maui.
Updated advertising manifest microsoft.net.sdk.maccatalyst.
Updated advertising manifest microsoft.net.sdk.ios.
Updated advertising manifest microsoft.net.workload.emscripten.
Updated advertising manifest microsoft.net.sdk.android.
Updated advertising manifest microsoft.net.sdk.macos.
Updated advertising manifest microsoft.net.workload.mono.toolchain.
Installing pack Microsoft.NET.Runtime.MonoAOTCompiler.Task version 6.0.2...
Writing workload pack installation record for Microsoft.NET.Runtime.MonoAOTCompiler.Task version 6.0.2...
Installing pack Microsoft.NET.Runtime.MonoTargets.Sdk version 6.0.2...
Writing workload pack installation record for Microsoft.NET.Runtime.MonoTargets.Sdk version 6.0.2...
Installing pack Microsoft.NETCore.App.Runtime.AOT.Cross.ios-arm version 6.0.2...
Workload installation failed. Rolling back installed packs...
Rolling back pack Microsoft.NET.Runtime.MonoAOTCompiler.Task installation...
Uninstalling workload pack Microsoft.NET.Runtime.MonoAOTCompiler.Task version 6.0.2…
Rolling back pack Microsoft.NET.Runtime.MonoTargets.Sdk installation...
Uninstalling workload pack Microsoft.NET.Runtime.MonoTargets.Sdk version 6.0.2…
Rolling back pack Microsoft.NETCore.App.Runtime.AOT.Cross.ios-arm installation...
Workload installation failed: microsoft.netcore.app.runtime.aot.osx-x64.cross.ios-arm::6.0.2 is not found in NuGet feeds https://nuget.pkg.github.com/emclient/index.json;https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-eng/nuget/v3/index.json;https://api.nuget.org/v3/index.json;https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet7/nuget/v3/index.json".
Error: Process completed with exit code 1.
About this issue
- Original URL
- State: open
- Created 2 years ago
- Reactions: 2
- Comments: 37 (25 by maintainers)
Thanks for the detailed response @joelverhagen. Glad to know there’s a 2 minute validation window during which we can unlist.
I met with our release team and we have a low cost proposal for the next release. The plan is to publish all .NET packages excluding the manifest packages, poll for availability (which I understand they already do), and then publish the manifest packages as a separate step afterwards. There is potential delay from the polling and there is still the potential for customer impact since the dozen or so manifests would still potentially light up at different times for different customers but it would reduce that window from 30+ minutes down to a much smaller window.
If that goes well, we’ll continue with that option. If there are still issues, we can explore the option of pushing unlisted and then listing all of the manifests together. I’ll be meeting with some Maui folks later today and suggesting they follow the same pattern as their publish is a separate process from the core .NET package publish today.
This problem also shows up with version wildcards and transitive dependencies in our regular dev pipelines, nuget really needs a transactional publish https://github.com/dotnet/performance/issues/3164
Hitting this presently with the current rollout for the
iosworkload.On closer inspection, it appears that
Microsoft.NET.Sdk.iOS.Manifest-7.0.100version16.2.1024contains:And it appears that
Microsoft.iOS.Sdkversion16.2.1024has been published, but version16.2.19has not - yet.I suggest that there be some validation added such that a manifest can’t be published until all of its dependent packages have been published first.
I agree that workloads are a special case here because of the way advertising manifests are automatically updated during restore.
Maybe we can skip a manifest during that automatic update until it is at least X hours old?
As @baronfel mentioned availability order of packages on NuGet.org is not guaranteed, even if packages are pushed in some clever (e.g. reverse dependency) order. This is because our asynchronous validation pipeline (malware scanning, signing, and more) can take a different amount of time per package. Imagine there is a package in the middle of the dependency graph that takes a bit longer to malware scan or is owned by another team that runs their push at a slightly different time (this happens more than you’d think). It’s sort of a messy problem.
It’s not currently feasible to fix the availability order unless the actor pushing the package polls for package availability before continuing with other package pushes. If done naively this would have horrible throughput (avg validation time * total number of packages to push, so many hours for big .NET releases). It would be possible to essentially do a topological sort to identify sets of packages that can be pushed together but this is all a hack/workaround for the feature gap on NuGet.org.
This general feature request is already tracked here https://github.com/NuGet/NuGetGallery/issues/3931. Feel free to add additional comments or upvote. I think the described solution (“staging”) is the Proper fix for the problem, but it’s a lot of work and needs some exploration with stakeholders to make sure the staging works as needed. I can say it’s not on our radar for the next 6 months given our other priorities and team capacity.
Responding to @marcpopMSFT’s https://github.com/dotnet/sdk/issues/23820#issuecomment-1421757609, it is possible to publish packages as unlisted. This can be done by using the “unlist” gesture (nuget.exe, .NET CLI, API endpoint) immediately after pushing, while the package is still in the validating state. This will always be the case since validation takes 2+ minutes and you can unlist immediately after the push request completes. Then, after you have detected that packages should be listed (i.e. the fully dependency graph is available), you can use the “relist” gesture (no CLI support, but has API support).
Responding to @akoeplinger’s https://github.com/dotnet/sdk/issues/23820#issuecomment-1424349420, yes that’s right. As mentioned to Marc above, you can unlist immediately after push so it can work today. Feel free to open an issue about enhancing the push protocol to include a “listed = false” parameter. This would remove the additional round trip, eliminate any crazy race condition caused by a slow/errored unlist or hyper fast validation, and provide parity with the UI upload flow which currently allows uploading a package as unlisted.
FYI, the best way to detect if a version is available is to use this API endpoint (a
HEADrequest should be fine): https://learn.microsoft.com/en-us/nuget/api/package-base-address-resource#download-package-content-nupkg. This endpoint has good performance and availability globally and properly caches 200s but not 404s.Yeah, the underlying issue here is that workloads conceptually provide a mechanism to tie multiple packages together as a unit and our nuget publishing pipeline has no concept of that dependency at any level. To really handle this we’d need something like a staged transaction to the nuget database.