bazel: Bazel remote cache is not a clear win

Description of the ~problem~ / feature request:

Applying a Bazel HTTP remote cache can make the build slower, depending on the project and artefacts being built.

I tried a few projects:

  • Cartographer ~2.8x faster with cache
  • Abseil ~1.5x faster with cache
  • cppitertools ~2x slower with cache
  • OpenTracing ~2x slower with cache

For the cache server, I tried both bazel-remote and my own Node.js server that I cobbled together. Both yielded similar results.

The cache was hosted on a reasonable Digital Ocean box in the same city:

$ less /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 85
model name      : Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz
stepping        : 4
microcode       : 0x1
cpu MHz         : 2294.608
...

$ free -m
              total        used        free      shared  buff/cache   available
Mem:           3944         119         153           0        3672        3544
Swap:             0           0           0

Internet connection speed for the client was around 10mbps, latency ~20ms. Not the fastest, but Bazel should be able to adapt to this.

The Bazel client was running on a fairly high-end laptop:

$ less /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 6
model           : 142
model name      : Intel(R) Core(TM) i7-8550U CPU @ 1.80GHz
stepping        : 10
microcode       : 0x9a
cpu MHz         : 700.060
cache size      : 8192 KB
...

$ free -m
              total        used        free      shared  buff/cache   available
Mem:          15806        4246        7541         755        4018       10582
Swap:         32767           0       32767

Suggestion

Perhaps the HTTP cache should record the time it took to build an artefact (according to the client). This would give Bazel enough information to decide if it is better to build or fetch.

Relevant variables:

  • Connection speed
  • Size of artefact
  • Estimated time to build artefact
  • Current build workload

Currently the server receives minimal metadata from Bazel.

What operating system are you running Bazel on?

Ubuntu 18.10

What’s the output of bazel info release?

release 0.23.1

Have you found anything relevant by searching the web?

Related discussion: https://github.com/bazelbuild/bazel/issues/6091

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 8
  • Comments: 26 (21 by maintainers)

Most upvoted comments

Will pick this up soon!

Same here. Huge +1 if we could get dynamic strategy for cache lookup/download. Right now we have to have developers flip on and off their remote cache based on their download speed. For some it changes based on the time of day because of shared internet resources.

it should be possible to make the lookup async and cancel the action if the lookup is faster (or cancel the lookup if the action is faster). This is similar to how the dynamic strategy works, and would hopefully reuse as much of the infrastructure as possible.

Big +1 we would love to see this. I asked @jmmv about this very thing on twitter and he replied:

No plans on that unfortunately, at least from my team at this point… I think this would take a very different implementation than the current dynamic scheduler though.

I’m not sure that we have enough information here to decide on a course of action. What’s the reason for the cached case to be slower? Is that something that can be fixed?

If it’s “just” the network round-trip time, then it should be possible to make the lookup async and cancel the action if the lookup is faster (or cancel the lookup if the action is faster). This is similar to how the dynamic strategy works, and would hopefully reuse as much of the infrastructure as possible.

However, I’m not sure how to handle cache writes. It is technically possible to make them async, but that’ll make it difficult to report errors.

Also https://github.com/bazelbuild/bazel/commit/76370d5453110c494b8066d0006e1986b8b039fa has been implemented which could help with such cases.

@bazelbuild/triage, I think this is still relevant… can we keep it open?

Ideally that’d use same/similar logic used by Dynamic Scheduling in Remote Execution https://blog.bazel.build/2019/02/01/dynamic-spawn-scheduler.html