sdk: Adding `global.json` version is causing `dotnet` to not run tool on linux

Description

We’re seeing an issue where our ubuntu-22.04 linux build started failing suddenly in the last couple of days (lkg was Jan 30 2:18PM PT).

As far as we can tell it’s related to having a global.json defined, and happening almost all the time on our linux environment, and sometimes on our Windows Server 2022 environment in our CI. (I do see our linux runner image changed from 20230122.1 to 20230129.2, but the release notes don’t indicate any changes to the .NET tooling. And in the Windows Server 2022 images, it succeeded on 20230129.1 in one case, but failed on 20230123.1.)

As we can see, .NET is already installed on the environment:

image

The SDK is verified as installed using the dotnet tool itself:

image

However, trying to run a dotnet tool is failing to find the installed SDK, before executing the tool:

image

Removing the global.json file seems to make things run, but then use the newer SDK instead of the pinned version we expect. At first I thought it was specific to rollForward, but have seen it fail in either case, though seems to happen more often if rollForward is specified?

Reproduction Steps

# https://docs.github.com/actions/using-workflows/about-workflows
# https://docs.github.com/actions/using-workflows/workflow-syntax-for-github-actions

name: CI

# Controls when the action will run.
on:
  # Triggers the workflow on push or pull request events but only for the main branch
  push:
    branches: [ main ]
  pull_request:
    branches: [ main ]

  # Allows you to run this workflow manually from the Actions tab
  workflow_dispatch:


# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  wasm-linux:
    runs-on: ubuntu-latest

    steps:
      - name: Install .NET 6 SDK
        uses: actions/setup-dotnet@v3
        with:
          dotnet-version: '6.0.x'

      # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
      - name: Checkout Repository
        uses: actions/checkout@v3

      # Restore Tools from Manifest list in the Repository
      - name: Restore dotnet tools
        run: dotnet tool restore

      - name: List SDKS [Debug]
        run: dotnet --info

      - name: Echo DOTNET Env [Debug]
        run: echo ${{ env.DOTNET_ROOT }}

      - name: Echo env [Debug]
        run:  env

      - name: Run slngen directly with diagnostics (which don't seem to output extra info)
        working-directory: ./
        run: dotnet -d slngen

Here we’re trying to run the slngen tool from our config:

{
  "version": 1,
  "isRoot": true,
  "tools": {
    "uno.check": {
      "version": "1.10.0",
      "commands": [
        "uno-check"
      ]
    },
    "xamlstyler.console": {
      "version": "3.2206.4",
      "commands": [
        "xstyler"
      ]
    },
    "microsoft.visualstudio.slngen.tool": {
      "version": "9.5.1",
      "commands": [
        "slngen"
      ]
    }
  }
}

Expected behavior

Able to find the installed .NET SDK version that is running!

Actual behavior

Fails to find an SDK that’s installed and aborts.

Regression?

No response

Known Workarounds

Still investigating, but noticed some of our other tools still ran fine in cases that this one didn’t. The difference seemed to be calling dotnet tool run slngen vs. using the shorter dotnet slngen syntax directly. Update: Seen it now occur with both syntaxes, so this is not the root cause, points us back to global.json alone then.

Still seeing if that’s consistent or not as this is a non-deterministic issue, but may shed light on the root cause here? As haven’t noticed issues when our xaml styler or uno-check tools run using dotnet tool run ...

Configuration

ubuntu-22.04 Windows Server 2022 GitHub Actions

.NET 6.0.405

Other information

Maybe related to dotnet/runtime#1374, initially seemed similar, but the install paths all seem to line up in our scenario, so think that was something else.

Also note that we’re wrapping these calls usually in a PowerShell environment, however, I tested excluding that from the equation and still saw the issue, so PowerShell is not a cause of this issue.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 19 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Note that dotnet.exe is a singleton so even if you use global.json, if .net 7 was installed it upgraded that one component. That’s typically not an issue since dotnet.exe should be fully backwards compatible except for the fact that we updated the --info output since that’s not meant for machines (so the update was not considered breaking).

For example in .NET 7 the output looks different from .NET 6.

I just realized that this might explain the “suddenly started failing” behavior. If the VM comes with .NET 7 preinstalled, the output of dotnet --info will look different from a VM which doesn’t have .NET 7 preinstalled - this might have caused the change of behavior in the slngen.

Looking more like this is an issue around not specifying dotnet tool run explicitly.

i.e. if you use:

dotnet slngen

it’ll fail most of the time on linux, but not always, and rarely on Windows.

If you use:

dotnet tool run slngen

then it works??? 🤷‍♂️

Will clean-up all the diagnostic code in our build script and try running again to confirm in the morning.