bazel: Jobs generating large amounts of files drastically slow down building with remote cache.
Description of the problem / feature request:
Building the go standard library with remote caching enabled is considerably slower than building locally.
IRL discussion with @buchgr resulted in the conclusion that the contents of the standard library were uploaded serially within Bazel, which lead to major performance decreases. It was noted that some concurrency is present for the rest of the build, since Bazel executes multiple jobs at once, resulting in pseudo-concurrent results; but since the go standard library counts as a single job, the upload is then performed serially.
Feature requests: what underlying problem are you trying to solve with this feature?
Decrease build time when populating empty remote caches by allowing concurrent uploads of files in individual actions.
Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
Build any go project using an empty remote cache; on normal internet connections, this can take a long time as the go standard library files are uploaded one by one.
Example timings, running bazel test on one of our projects with a clean local cache:
With remote caching:
bazel test --remote_http_cache=https://storage.googleapis.com/<our cache>
0.07s user 0.15s system 0% cpu 14:07.77 total
No remote caching:
bazel test
0.05s user 0.08s system 0% cpu 2:46.71 total
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Reactions: 8
- Comments: 18 (13 by maintainers)
Commits related to this issue
- HttpCacheClient: make upload async This matches the behavior for download, and improves performance for actions with a lot of large outputs. The code was already using futures in the API and in the ... — committed to ulfjack/bazel by ulfjack 4 years ago
- HttpCacheClient: make upload async This matches the behavior for download, and improves performance for actions with a lot of outputs. The code was already using futures in the API and in the implem... — committed to Yannic/bazel by ulfjack 4 years ago
Hm, so upon running
bazel build //...
in rules_haskell many more times (with remote caching on GCS turned on), it looks like my first number was an aberration. I now use--remote_upload_local_results=false
to make sure we’re not measuring any uploading (only downloading). I consistently get ~225 seconds on my connection. Only a 30% improvement on total build time, but at least caching is now an improvement, not a pessimization.I was wondering - what’s the algorithm for fetching from the cache? In my use case, I have 329MB of cache data to transfer to my build machine. I notice using
ntop
that the network fetches are very bursty. Sustained 50MB/sec for first few seconds then averages at about 1MB/sec until end of build (with some big bursts). Hence why it takes 200+ seconds to download the fully 329MB. This seems a bit slow. Since the dependency graph is fully known by the end of the analysis phase, couldn’t we in principle fetch all action outputs for a given target in one go? And if so, shouldn’t I expect the build to be fully network bandwidth limited if the cache hit rate is 100%? Especially if all fetches are properly pipelined (i.e. async)?Ok, I sent a patch.
Err… so the API uses futures, and the HTTP client uses futures, but apparently the glue code between the two is blocking. That is … unfortunate, but luckily easy to fix.
Yes, the Go standard library mentioned at the top can take 20 minutes to build with remote caching enabled, when there is a remote cache miss. By contrast, it takes about 30s to build from scratch without remote cache involved. The problem is with the upload, not the download.
ff008f445905bf6f4601a368782b620f7899d322 implemented parallel downloads, but output upload in
SimpleBlobStoreActionCache
is still serial.Yes, I believe Jakob’s parallel uploads change made this much better. I also have a change in flight now that adds batched uploads, thank you for reminding me to test it on this use case, I believe it should additionally improve it.