bazel: Actions fetched from remote cache / executed remotely should not count towards parallelism limit

When a fairly large application is built using the remote cache, and all the actions are fully cached. Bazel still keeps the parallelism set to the number of cores in the machine. However, most actions are just waiting for network I/O.

With the default settings on an 8-core machine, I get:

$ time bazel build //...
## output elided
INFO: Elapsed time: 97.567s, Critical Path: 26.44s
INFO: 2000 processes: 2000 remote cache hit.
INFO: Build completed successfully, 2240 total actions

real	1m38.518s
user	0m0.051s
sys	0m0.061s

But if I bump up the number of jobs to a crazy number:

$ time bazel build --jobs 2000 //...
## output elided
INFO: Elapsed time: 39.535s, Critical Path: 31.33s
INFO: 2000 processes: 2000 remote cache hit.
INFO: Build completed successfully, 2240 total actions

real	0m40.483s
user	0m0.048s
sys	0m0.058s

I think the --jobs option should only apply to local actions.

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Reactions: 2
  • Comments: 26 (18 by maintainers)

Commits related to this issue

Most upvoted comments

We are working on this again, in a different way. Next step is to upgrade to a modem JDK which will allow us to use loom/virtual threads.

At this time, --jobs determines how many threads Bazel creates internally, and local_cpu_resources determines how many subprocesses Bazel is allowed to run concurrently. However, Bazel threads block on local & remote subprocesses. Therefore, if --jobs is less than --local_cpu_resources, then --local_cpu_resources is effectively ignored, and Bazel runs at most --jobs subprocesses.

For remote builds, however, --jobs determines how many remote processes can run in parallel, whereas --local_cpu_resources is ignored. That means if you use remote caching or remote execution, you must increase --jobs to get a speedup.

However, changes are afoot, although I suspect that they’re not going to be finished before the end of the year, and might take into next year. Specifically, we’re working on decoupling --jobs. The plan is for Bazel to manage both local and remote execution without blocking threads. This makes it so --jobs no longer implicitly limits the number of local subprocesses in favor of --local_cpu_resources. Similar for remote execution. That should avoid the need to tweak --jobs if you want to use remote execution, and improve scaling if you have a lot of remote executors while allowing you to limit local Bazel CPU consumption.

I assume you mean Bazel’s memory consumption, not the remote execution systems. Let’s first look at how it’s going. It’s not clear at this point that async execution will increase Bazel’s memory consumption. There is reason to believe that the current code is not ideal wrt. memory consumption, with data being retained although it could be garbage collected.

Thanks @ob! A colleague of mine has a prototype of this, however I still expect this to be at least 3-6 months away to land in a released Bazel version!