bazel: bazel 5.0.0rc3 crashes with local and remote actions being done

Description of the problem / feature request:

I’m trying out 5.0.0rc3 in our CI environment and saw the following crash:

(11:09:44) INFO: Caught InterruptedException from ExecException for remote branch of sensors/tools/linear_range/_objs/LinearRangeControllerModule_static/linear_range_controller_module.pic.o, which may cause a crash.
--
  | (11:10:01) FATAL: bazel crashed due to an internal error. Printing stack trace:
  | java.lang.AssertionError: Neither branch of sensors/tools/linear_range/_objs/LinearRangeControllerModule_static/linear_range_controller_module.pic.o completed. Local was cancelled and done and remote was not cancelled and done.
  | at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy.waitBranches(DynamicSpawnStrategy.java:345)
  | at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy.exec(DynamicSpawnStrategy.java:733)
  | at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
  | at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:68)
  | at com.google.devtools.build.lib.rules.cpp.CppCompileAction.beginExecution(CppCompileAction.java:1430)
  | at com.google.devtools.build.lib.actions.Action.execute(Action.java:133)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:907)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1076)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1031)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:152)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:91)
  | at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:492)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:856)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:349)
  | at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
  | at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:590)
  | at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
  | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
  | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
  | at java.base/java.lang.Thread.run(Unknown Source)
  | (11:10:03) Failed with return code 37.

Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

This is just setting .bazelversion to 5.0.0rc3 and running it on our entire build. This involves a remote buildfarm cluster. I don’t know of a “simple” way to reproduce this.

What operating system are you running Bazel on?

Everything’s on x86_64 Linux.

What’s the output of bazel info release?

$ bazel info release
Starting local Bazel server and connecting to it...
release 5.0.0rc3

Have you found anything relevant by searching the web?

I couldn’t find anything pertinent.

Any other information, logs, or outputs that you want to share?

Not at this time.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 40 (38 by maintainers)

Commits related to this issue

Most upvoted comments

Found the root cause! Working on the fix.

@bazel-io fork 5.1

Is this caused by manual interruptions during the build?

I don’t see anything in our logs that would indicate a manual interruption during the build.

EDIT: It’s certainly possible that this is a fluke (or maybe a bug in our CI system), but I figured it’s worth posting in case it isn’t. I’ve never encountered this error before in our CI system.