bazel: Actions fetched from remote cache / executed remotely should not count towards parallelism limit

When a fairly large application is built using the remote cache, and all the actions are fully cached. Bazel still keeps the parallelism set to the number of cores in the machine. However, most actions are just waiting for network I/O.

With the default settings on an 8-core machine, I get:

$ time bazel build //...
## output elided
INFO: Elapsed time: 97.567s, Critical Path: 26.44s
INFO: 2000 processes: 2000 remote cache hit.
INFO: Build completed successfully, 2240 total actions

real	1m38.518s
user	0m0.051s
sys	0m0.061s

But if I bump up the number of jobs to a crazy number:

$ time bazel build --jobs 2000 //...
## output elided
INFO: Elapsed time: 39.535s, Critical Path: 31.33s
INFO: 2000 processes: 2000 remote cache hit.
INFO: Build completed successfully, 2240 total actions

real	0m40.483s
user	0m0.048s
sys	0m0.058s

I think the --jobs option should only apply to local actions.

About this issue

Original URL
State: open
Created 6 years ago
Reactions: 2
Comments: 26 (18 by maintainers)

Commits related to this issue

Add Skyframe support for external computations Add a SkyFunction.Environment.dependOnFuture() method to register a dependency on the completion of a ListenableFuture. Once all such futures are comple... — committed to bazelbuild/bazel by ulfjack 5 years ago
Refactor SkyframeActionExecutor Move all action state tracking to a new subclass, which replaces the use of Pair+FutureTask, and carefully ensures that we don't hold onto memory for any longer than n... — committed to bazelbuild/bazel by ulfjack 5 years ago
Experimental: spawn actions can execute asynchronously The new code path is only enabled if experimental_async_execution is set to true. It is also not particularly useful at this time, as it only co... — committed to bazelbuild/bazel by ulfjack 5 years ago
Move ActionExecutionState to a top-level class This is in preparation for merging ActionExecutionFunction and SkyframeActionExecutor, or at least sharing more infrastructure between the two. This sh... — committed to bazelbuild/bazel by ulfjack 5 years ago
Wait for shared actions asynchronously Instead of blocking the current thread, shared actions are waited for using the ListenableFuture mechanism in Skyframe. Also clean up the API a bit. Progress ... — committed to bazelbuild/bazel by ulfjack 5 years ago
Move action post-processing to a continuation Action post-processing consists of updating action inputs after execution as well as writing an entry to the action cache. This implicitly ensures that w... — committed to bazelbuild/bazel by ulfjack 5 years ago
Refactor the async execution interface for actions This makes the interface generic such that it can be used for multiple steps as well as for multiple concurrently running spawns, effectively allowi... — committed to bazelbuild/bazel by ulfjack 5 years ago
Add SpawnContinuation Async action execution is implemented as layered, nested state machines, where each state machine corresponds to a specific behavior. Each instance of a continuation is a state ... — committed to bazelbuild/bazel by ulfjack 5 years ago
Fix SpawnAction Wrapping continuations in more continuations is more complex than it would seem. The problem is that the code paths to begin execution and to continue execution ideally perform identi... — committed to bazelbuild/bazel by ulfjack 5 years ago
Refactor StandaloneTestStrategy - remove unnecessary throw in finalizeTest, simplify throws clause (TestRunnerAction already throws if the last attempt was unsuccessful) - remove finally block in e... — committed to bazelbuild/bazel by ulfjack 5 years ago
Add TestAttemptContinuation This adds an interface for representing async test attempt execution. While I'd love to move this up to TestRunnerAction, unfortunately, the Bazel and Blaze implementation... — committed to bazelbuild/bazel by ulfjack 5 years ago
Fix max attempts computation w/ fallback Progress on #6394. PiperOrigin-RevId: 238939636 — committed to bazelbuild/bazel by ulfjack 5 years ago
Add async support to StandaloneTestStrategy Progress on #6394. PiperOrigin-RevId: 238977749 — committed to bazelbuild/bazel by ulfjack 5 years ago
Inline SpawnGccStrategy It is the only remaining implementation of CppCompileActionContext, which is hereby deleted. This is in preparation for making C++ action execution async. Progress on #6394.... — committed to bazelbuild/bazel by ulfjack 5 years ago
JavaCompileAction: implement async execution Progress on #6394. PiperOrigin-RevId: 239363618 — committed to bazelbuild/bazel by ulfjack 5 years ago
Inline CppCompileAction.execWithReply This allows us to remove CppCompileActionResult, and also simplifies the code. This is in preparation for async C++ compile action execution. Progress on #6394... — committed to bazelbuild/bazel by ulfjack 5 years ago
Run the test spawn with the correct FileOutErr Progress on #6394. PiperOrigin-RevId: 239382176 — committed to bazelbuild/bazel by ulfjack 5 years ago
Support async execution in CppLinkAction We should refactor the action hierarchy to share more code. Most of this is copied from SpawnAction. Progress on #6394. PiperOrigin-RevId: 239385774 — committed to bazelbuild/bazel by ulfjack 5 years ago
Fix result collection if spawn is immediately done Another small fix that I discovered running our existing tests. Progress on #6394. PiperOrigin-RevId: 239390890 — committed to bazelbuild/bazel by ulfjack 5 years ago
Support partially async C++ action execution This does not cover include scanning, only C++ compilation. Progress on #6394. PiperOrigin-RevId: 239436813 — committed to bazelbuild/bazel by ulfjack 5 years ago

Most upvoted comments

We are working on this again, in a different way. Next step is to upgrade to a modem JDK which will allow us to use loom/virtual threads.

meisterT on May 10, 2023

At this time, --jobs determines how many threads Bazel creates internally, and local_cpu_resources determines how many subprocesses Bazel is allowed to run concurrently. However, Bazel threads block on local & remote subprocesses. Therefore, if --jobs is less than --local_cpu_resources, then --local_cpu_resources is effectively ignored, and Bazel runs at most --jobs subprocesses.

For remote builds, however, --jobs determines how many remote processes can run in parallel, whereas --local_cpu_resources is ignored. That means if you use remote caching or remote execution, you must increase --jobs to get a speedup.

However, changes are afoot, although I suspect that they’re not going to be finished before the end of the year, and might take into next year. Specifically, we’re working on decoupling --jobs. The plan is for Bazel to manage both local and remote execution without blocking threads. This makes it so --jobs no longer implicitly limits the number of local subprocesses in favor of --local_cpu_resources. Similar for remote execution. That should avoid the need to tweak --jobs if you want to use remote execution, and improve scaling if you have a lot of remote executors while allowing you to limit local Bazel CPU consumption.

ulfjack on Sep 3, 2019

I assume you mean Bazel’s memory consumption, not the remote execution systems. Let’s first look at how it’s going. It’s not clear at this point that async execution will increase Bazel’s memory consumption. There is reason to believe that the current code is not ideal wrt. memory consumption, with data being retained although it could be garbage collected.

ulfjack on Mar 25, 2019

Thanks @ob! A colleague of mine has a prototype of this, however I still expect this to be at least 3-6 months away to land in a released Bazel version!

buchgr on Oct 30, 2018