bazel: Fatal error resolving a toolchain when using bzlmod

Description of the bug:

Using Bazel 6.2.1, when executing bazel test //... in a child workspace using rules_bazel_integration_test, I see a fatal error (see below). Interestingly, when I just cd into the child workspace and run bazel test //... from the command line, it does not fail.

Fatal error:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.IllegalStateException: Value for: 'ToolchainContextKey{configurationKey=BuildConfigurationKey[8e16d2cfa11e2db6ed6cf6f2d7b88e102d4e16bf0b47fc5f72d75369ff9270dc], toolchainTypes=[ToolchainTypeRequirement{toolchainType=@rules_swift_tidy~override//swiftformat:toolchain, mandatory=true}], execConstraintLabels=[], forceExecutionPlatform=Optional.empty, debugTarget=false}' was missing, this should never happen
	at com.google.devtools.build.lib.bugreport.BugReport.sendBugReport(BugReport.java:182)
	at com.google.devtools.build.lib.bugreport.BugReport.logUnexpected(BugReport.java:153)
	at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.computeUnloadedToolchainContexts(ConfiguredTargetFunction.java:736)
	at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.computeUnloadedToolchainContexts(ConfiguredTargetFunction.java:642)
	at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.compute(ConfiguredTargetFunction.java:296)
	at com.google.devtools.build.skyframe.ParallelEvaluator.bubbleErrorUp(ParallelEvaluator.java:427)
	at com.google.devtools.build.skyframe.ParallelEvaluator.waitForCompletionAndConstructResult(ParallelEvaluator.java:216)
	at com.google.devtools.build.skyframe.ParallelEvaluator.doMutatingEvaluation(ParallelEvaluator.java:182)
	at com.google.devtools.build.skyframe.ParallelEvaluator.eval(ParallelEvaluator.java:677)
	at com.google.devtools.build.skyframe.InMemoryMemoizingEvaluator.evaluate(InMemoryMemoizingEvaluator.java:203)
	at com.google.devtools.build.lib.skyframe.SkyframeExecutor.configureTargets(SkyframeExecutor.java:2217)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.configureTargets(SkyframeBuildView.java:359)
	at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:394)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.runAnalysisPhase(AnalysisPhaseRunner.java:233)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.execute(AnalysisPhaseRunner.java:139)
	at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:180)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:494)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:462)
	at com.google.devtools.build.lib.runtime.commands.TestCommand.doTest(TestCommand.java:148)
	at com.google.devtools.build.lib.runtime.commands.TestCommand.exec(TestCommand.java:113)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:625)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:240)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
	at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:614)
	at io.grpc.Context$1.run(Context.java:566)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

Using Bazel 7.0.0-pre.20230517.4, it still fails, but with a different error:

FATAL: bazel crashed due to an internal error. Printing stack trace:
java.lang.IllegalStateException: Unexpected exception: dep Dependency{label=//:swiftformat_fmt_main.swift, configuration=0747b6e24c727cd674aa08cd016281faf72dd24df799fa11a8640fa2ea8ed968, aspects=AspectCollection{[]}, transitionKeys=[], executionPlatformLabel=null} had null value, even though there were no values missing in the initial fetch. That means it had an unexpected exception type (not ConfiguredValueCreationException)
	at com.google.devtools.build.lib.bugreport.BugReport.sendBugReport(BugReport.java:183)
	at com.google.devtools.build.lib.bugreport.BugReport.logUnexpected(BugReport.java:154)
	at com.google.devtools.build.lib.skyframe.PrerequisiteProducer.resolveConfiguredTargetDependencies(PrerequisiteProducer.java:947)
	at com.google.devtools.build.lib.skyframe.PrerequisiteProducer.computeDependencies(PrerequisiteProducer.java:735)
	at com.google.devtools.build.lib.skyframe.PrerequisiteProducer.evaluate(PrerequisiteProducer.java:348)
	at com.google.devtools.build.lib.skyframe.ConfiguredTargetFunction.compute(ConfiguredTargetFunction.java:202)
	at com.google.devtools.build.skyframe.ParallelEvaluator.bubbleErrorUp(ParallelEvaluator.java:423)
	at com.google.devtools.build.skyframe.ParallelEvaluator.waitForCompletionAndConstructResult(ParallelEvaluator.java:212)
	at com.google.devtools.build.skyframe.ParallelEvaluator.doMutatingEvaluation(ParallelEvaluator.java:178)
	at com.google.devtools.build.skyframe.ParallelEvaluator.eval(ParallelEvaluator.java:676)
	at com.google.devtools.build.skyframe.InMemoryMemoizingEvaluator.evaluate(InMemoryMemoizingEvaluator.java:177)
	at com.google.devtools.build.lib.skyframe.SkyframeExecutor.configureTargets(SkyframeExecutor.java:2306)
	at com.google.devtools.build.lib.skyframe.SkyframeBuildView.configureTargets(SkyframeBuildView.java:344)
	at com.google.devtools.build.lib.analysis.BuildView.update(BuildView.java:445)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.runAnalysisPhase(AnalysisPhaseRunner.java:247)
	at com.google.devtools.build.lib.buildtool.AnalysisPhaseRunner.execute(AnalysisPhaseRunner.java:140)
	at com.google.devtools.build.lib.buildtool.BuildTool.buildTargets(BuildTool.java:178)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:503)
	at com.google.devtools.build.lib.buildtool.BuildTool.processRequest(BuildTool.java:471)
	at com.google.devtools.build.lib.runtime.commands.TestCommand.doTest(TestCommand.java:148)
	at com.google.devtools.build.lib.runtime.commands.TestCommand.exec(TestCommand.java:113)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.execExclusively(BlazeCommandDispatcher.java:625)
	at com.google.devtools.build.lib.runtime.BlazeCommandDispatcher.exec(BlazeCommandDispatcher.java:240)
	at com.google.devtools.build.lib.server.GrpcServerImpl.executeCommand(GrpcServerImpl.java:550)
	at com.google.devtools.build.lib.server.GrpcServerImpl.lambda$run$1(GrpcServerImpl.java:614)
	at io.grpc.Context$1.run(Context.java:566)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.base/java.lang.Thread.run(Unknown Source)

Typically, when a build/test fails using rules_bazel_integration_test, it is related to a missing environment variable or configuration. Are there any external configuration values that might affect toolchain resolution?

What’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Steps

  1. Clone https://github.com/cgrindel/rules_swiftformat.
  2. Checkout the toolchain_fatal_error_repro branch. git checkout toolchain_fatal_error_repro
  3. Run bazel test //examples:simple_test.

The test will fail with the fatal error.

Which operating system are you running Bazel on?

MacOS Ventura 13.3.1

What is the output of bazel info release?

release 6.2.1

If bazel info release returns development version or (@non-git), tell us how you built Bazel.

NA

What’s the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD ?

NA

Is this a regression? If yes, please try to identify the Bazel commit where the bug was introduced.

No response

Have you found anything relevant by searching the web?

No response

Any other information, logs, or outputs that you want to share?

No response

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 21 (9 by maintainers)

Most upvoted comments

Still dealing with bzlmod error handling, but here’s the underlying error that’s getting lost in this failure:

Error loading '@rules_swift_tidy~override//swiftformat:extensions.bzl' for module extensions, requested by /usr/local/google/home/jcater/.cache/bazel/_bazel_jcater/8d79a2ce3b6733649bb72a30e4b9639b/execroot/_main/bazel-out/k8-fastbuild/bin/examples/simple_test.runfiles/_main/examples/simple/MODULE.bazel:22:33: at /usr/local/google/home/jcater/.cache/bazel/_bazel_jcater/8d79a2ce3b6733649bb72a30e4b9639b/execroot/_main/_tmp/0d74cd122ff726b298f6134e0fe5e1d0/_bazel_jcater/374a05da1afc2fe9d20700880e5fe999/external/rules_swift_tidy~override/swiftformat/extensions.bzl:4:5: Label '@rules_swift_tidy~override//swiftformat/bzlmod:swift_tidy_tools.bzl' is invalid because 'swiftformat/bzlmod' is not a package; perhaps you meant to put the colon here: '@rules_swift_tidy~override//swiftformat:bzlmod/swift_tidy_tools.bzl'?: at /usr/local/google/home/jcater/.cache/bazel/_bazel_jcater/8d79a2ce3b6733649bb72a30e4b9639b/execroot/_main/_tmp/0d74cd122ff726b298f6134e0fe5e1d0/_bazel_jcater/374a05da1afc2fe9d20700880e5fe999/external/rules_swift_tidy~override/swiftformat/extensions.bzl:4:5: Label '@rules_swift_tidy~override//swiftformat/bzlmod:swift_tidy_tools.bzl' is invalid because 'swiftformat/bzlmod' is not a package; perhaps you meant to put the colon here: '@rules_swift_tidy~override//swiftformat:bzlmod/swift_tidy_tools.bzl'?

Interestingly, when I just cd into the child workspace and run bazel test //… from the command line, it does not fail.

That’s because of the use of BugReport#logUnexpected at https://github.com/bazelbuild/bazel/blob/06992d2d0da3ab4628028cecbb3f3dc4965f9e88/src/main/java/com/google/devtools/build/lib/skyframe/PrerequisiteProducer.java#L877 (see the Javadoc and implementation).

There’s probably a reason bazel it trying to detect itself running within itself (a log message mentions idle time, so maybe just about server lifetime?).

Another category of reason is: Trying to detect it’s being run in a [integration] test so it can make stronger assertions. In the specific example above, it’s upgrading a warning debug log line to a crash.

Okay, with some help I can now debug this.

The underlying issue is that something is causing ExternalDepsException to be thrown during ConfiguredTargetFunction evaluation, but it’s not being handled, which leads to this sort of crash.

So, open questions:

  1. What’s throwing that exception, and where should it have been handled?
  2. Why is that exception being thrown in this case, anyway?

I suspect that once 1 is answered, 2 will be easier to answer.

katre asked me to repro this, but I no longer can. Super weird. @cgrindel are you able to repro still?

I ran into this today, too. My error is similar as the first posted error about the toolchain key (except mine is about the python toolchain, since thats what I’m working with).

From my debugging, I’m not sure if this is toolchain-specific or more generally bzlmod. So cc @Wyverald.

  • OK: When bzlmod is disabled on the inner Bazel
  • OK: When TEST_TMPDIR is unset before running the inner Bazel
  • OK: Using rules_python 0.21.0 or 0.22.0 and how they must call register_toolchains(); note these two versions don’t use the hub-spoke model
  • OK: register_toolchains("@bazel_tools//tools/python:autodetecting_toolchain")
  • ERROR: Using rules_python at head (unreleased), which uses the hub-spoke model (and thus has a register_toolchains call within it), the error reproduces. Removing that register_toolchains call avoids the error. Moving it elsewhere (i.e. into the root module) reproduces the error. Unsetting TEST_TMPDIR avoids the error.
  • ERROR: Setting TEST_TMPDIR to another directory entirely

So, to me, in combination with the different error with Bazel 7 posted, this seems to point to a bad interaction between bzlmod and bazel-in-bazel testing.

Note 1: In startup_options.cc, there’s special logic that uses TEST_TMPDIR as the indicator that it’s a bazel-in-bazel test, and then purposefully sets the output root to that. So unsetting TEST_TMPDIR might work, but I’m not really sure it’s entirely kosher. There’s probably a reason bazel it trying to detect itself running within itself (a log message mentions idle time, so maybe just about server lifetime?).

Note 2: there’s quite a few env vars set in a test invocation. Another thought I had was maybe TEST_TMPDIR is just acting at the bazel-in-bazel marker, and it’s actually another envvar that has a value bzlmod doesn’t like? I dunno; i didn’t exhaustively go through all the env vars.