bazel: bazel 5.0.0rc3 crashes with local and remote actions being done
Description of the problem / feature request:
I’m trying out 5.0.0rc3 in our CI environment and saw the following crash:
(11:09:44) INFO: Caught InterruptedException from ExecException for remote branch of sensors/tools/linear_range/_objs/LinearRangeControllerModule_static/linear_range_controller_module.pic.o, which may cause a crash.
--
| (11:10:01) FATAL: bazel crashed due to an internal error. Printing stack trace:
| java.lang.AssertionError: Neither branch of sensors/tools/linear_range/_objs/LinearRangeControllerModule_static/linear_range_controller_module.pic.o completed. Local was cancelled and done and remote was not cancelled and done.
| at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy.waitBranches(DynamicSpawnStrategy.java:345)
| at com.google.devtools.build.lib.dynamic.DynamicSpawnStrategy.exec(DynamicSpawnStrategy.java:733)
| at com.google.devtools.build.lib.actions.SpawnStrategy.beginExecution(SpawnStrategy.java:47)
| at com.google.devtools.build.lib.exec.SpawnStrategyResolver.beginExecution(SpawnStrategyResolver.java:68)
| at com.google.devtools.build.lib.rules.cpp.CppCompileAction.beginExecution(CppCompileAction.java:1430)
| at com.google.devtools.build.lib.actions.Action.execute(Action.java:133)
| at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:907)
| at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:1076)
| at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:1031)
| at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:152)
| at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:91)
| at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:492)
| at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:856)
| at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.computeInternal(ActionExecutionFunction.java:349)
| at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:169)
| at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:590)
| at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:382)
| at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
| at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
| at java.base/java.lang.Thread.run(Unknown Source)
| (11:10:03) Failed with return code 37.
Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
This is just setting .bazelversion
to 5.0.0rc3
and running it on our entire build. This involves a remote buildfarm cluster.
I don’t know of a “simple” way to reproduce this.
What operating system are you running Bazel on?
Everything’s on x86_64 Linux.
What’s the output of bazel info release
?
$ bazel info release
Starting local Bazel server and connecting to it...
release 5.0.0rc3
Have you found anything relevant by searching the web?
I couldn’t find anything pertinent.
Any other information, logs, or outputs that you want to share?
Not at this time.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 40 (38 by maintainers)
Commits related to this issue
- Remote: Fix crashes by InterruptedException when dynamic execution is enabled. Fixes #14433. The root cause is, inside `RemoteExecutionCache`, the result of `FindMissingDigests` is shared with other... — committed to coeuvre/bazel by coeuvre 2 years ago
- Remote: Fix crashes by InterruptedException when dynamic execution is enabled. Fixes #14433. The root cause is, inside `RemoteExecutionCache`, the result of `FindMissingDigests` is shared with other... — committed to coeuvre/bazel by coeuvre 2 years ago
- Remote: Fix crashes by InterruptedException when dynamic execution is enabled. (#15091) Fixes #14433. The root cause is, inside `RemoteExecutionCache`, the result of `FindMissingDigests` is shared... — committed to bazelbuild/bazel by coeuvre 2 years ago
Found the root cause! Working on the fix.
@bazel-io fork 5.1
I don’t see anything in our logs that would indicate a manual interruption during the build.
EDIT: It’s certainly possible that this is a fluke (or maybe a bug in our CI system), but I figured it’s worth posting in case it isn’t. I’ve never encountered this error before in our CI system.