bazel: Sandbox slowness on OSX
Description of the problem / feature request:
building has been extremely slow with the default darwin-sandbox
Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
A mini repro could be found in https://github.com/alexeagle/rules_sass_repro The repro contains 40 empty sass files Running the sass compiler on them should be fast
bazel build :all takes ~60s on my mac
bazel build --strategy=SassCompiler=local :all takes ~4s
What operating system are you running Bazel on?
Mac OS 10.14.4
What’s the output of bazel info release
?
release 0.25.0
Have you found anything relevant by searching the web?
I found these issues: https://github.com/bazelbuild/bazel/issues/902 and https://github.com/bazelbuild/bazel/issues/1836 but they all seem obsolete.
JSON profile
According to https://docs.bazel.build/versions/master/skylark/performance.html#json-profile, I grabbed profiles for different strategies:
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 1
- Comments: 98 (64 by maintainers)
Commits related to this issue
- fix(create): disable sandbox on MacOS by default in new projects This trades-off performance for correctness, and might not be wise because missing dependencies give non-hermetic behavior. See https:... — committed to alexeagle/rules_nodejs by deleted user 4 years ago
- fix(create): disable sandbox on MacOS by default in new projects This trades-off performance for correctness, and might not be wise because missing dependencies give non-hermetic behavior. See https:... — committed to alexeagle/rules_nodejs by deleted user 4 years ago
- fix(create): disable sandbox on MacOS by default in new projects This trades-off performance for correctness, and might not be wise because missing dependencies give non-hermetic behavior. See https:... — committed to alexeagle/rules_nodejs by deleted user 4 years ago
- Add experimental reuse of non-worker sandboxes. This uses the same functionality as the worker sandboxing to reuse existing sandboxing. Where the worker sandboxes just stay in place, for non-worker s... — committed to bazelbuild/bazel by larsrc-google 3 years ago
- bazel: disable sandboxing and avoid stamping - Sandboxing on MacOS have measurable overhead. See https://github.com/bazelbuild/bazel/issues/8230; it comes from symlinks being much slower to creat... — committed to irfansharif/cockroach by irfansharif 2 years ago
- bazel: disable sandboxing and avoid stamping - Sandboxing on MacOS have measurable overhead. See https://github.com/bazelbuild/bazel/issues/8230; it comes from symlinks being much slower to creat... — committed to irfansharif/cockroach by irfansharif 2 years ago
- bazel: disable sandboxing and avoid stamping - Sandboxing on MacOS have measurable overhead. See https://github.com/bazelbuild/bazel/issues/8230; it comes from symlinks being much slower to creat... — committed to irfansharif/cockroach by irfansharif 2 years ago
- Merge #79360 79360: bazel: disable sandboxing and avoid stamping r=irfansharif a=irfansharif - Sandboxing on MacOS have measurable overhead. See https://github.com/bazelbuild/bazel/issues/8230; i... — committed to cockroachdb/cockroach by deleted user 2 years ago
- Optimize sandbox performance Link just the files needed to compile, link, or archive, rather than the entire directory tree. This drastically improves build times on macOS, where the sandbox is known... — committed to figma/emsdk by jfirebaugh 2 years ago
- Optimize sandbox performance (#1045) * Optimize sandbox performance Link just the files needed to compile, link, or archive, rather than the entire directory tree. This drastically improves build ... — committed to emscripten-core/emsdk by jfirebaugh 2 years ago
- Optimize sandbox performance (#1045) * Optimize sandbox performance Link just the files needed to compile, link, or archive, rather than the entire directory tree. This drastically improves build ... — committed to radekdoulik/emsdk by jfirebaugh 2 years ago
- Bump emscripten to 3.1.30 (#282) * [bazel] Set CLOSURE_COMPILER to workaround RBE+symlinks issue (#1037) * [bazel] Set CLOSURE_COMPILER to workaround RBE+symlinks issue * space * specify nod... — committed to dotnet/emsdk by radekdoulik a year ago
@burdiyan, I’ve fallen for something similar before where
clang
was super-slow in sandboxed mode (20x slower) and it was because of the module cache being recreated all the time, not due to any particular slowness ofsandbox-exec
. Like @jmmv said,sandbox-exec
is kind of an easy target, being unsupported and all, but actual benchmarks have never shown it to be particularly slow.Just for kicks I tried building a small part of the LLVM compiler and benchmarked the following scenarios:
sandbox-exec
that just callsexecvp(3)
CopyingSandboxedSpawn
mentioned by @larsrc-google in a comment above.The code is pushed to the
darwin-fake-sandbox
branch here, I didn’t bother pluggin it to the build so if you want to try it you need to manually compile thefake-sandbox-exec
program an change the path in the code.These are the results:
As you can see, removing the
sandbox-exec
command is a wash and copying is slower than symlinking (news at 11).I don’t see any great advantage to such a strategy, certainly not strong to merit the maintenance and extra complexity. We have too many sandboxing strategies already.
In this example since you’re running golang outside of a bazel rule it’s likely generating its own cache, which it is blocked from reading / writing to when using the sandbox.
I think there’re indeed some lower-hanging fruits to improve Bazel on macOS situation.
First of all I’m going to assume (and I might be ridiculously wrong about it) that for a lot of people when they talk about sandboxed build, the mostly care about input files isolation rather than network and other stuff like that, which a true sandbox gives you. This is true for me as well: I don’t care much about rules being able to access the network, because I know what the cost of it and I’m not going to do it, but I do want input files isolation, so that I know that I’m not missing to specify any inputs when I’m building the target.
If that is true for many people (at least on macOS), then Bazel could have several options to improve their lives (I know nothing about how complex implementing any of them could actually be):
darwin-sandbox
strategy only care about input files. So stop usingsandbox-exec
, and simply assemble execroots with input files only for each target. Need to measure whether copy or symlink works better here, and maybe this even could be a flag to choose. Don’t know if this could be really called a “sandbox” in this case though.local
strategy is. Right nowlocal
targets have access to all the workspace files, and it is probably not something that most people want. Maybe Bazel could do the same as Please for its local strategy, i.e. no other isolation except for input files. Again copy vs. symlink could be benchmarked, or even be configurable.local-isolated
or whatever.Basically options 1-3 are all the same, but named differently, and might have different level of impact in terms of implementation.
I also got a trace profile:
as you can see the sandbox setup/teardown is where most of the time is spent. This is without
--experimental_reuse_sandbox_directories
I don’t think there’s really a way out of this in the short term. I would suggest disabling sandboxing for local dev for performance, and enabling it for CI release builds in case anything slips through. This is what most folks are doing today.
Any updates on this? My team is waiting on this before moving to bazel
I think this problem is especially noticeable under nodejs rules, where the number of inputs is easily an order of magnitude more than most other ecosystems due to lack of archive files (every file in the package is a separate input) and the dependency problem in JS (hundreds of transitive dependencies for common tools like react-scripts). We’re working on a fix in rules_nodejs to provide each package as a directory (TreeArtifact) instead, though it’s a breaking change (you can no longer reference individual files with labels)
There is another performance issue: https://github.com/bazelbuild/bazel/issues/20584
I will take care of this one. After it’s checked in I will profile again and see what else we can do.
Thank you for contributing to the Bazel repository! This issue has been marked as stale since it has not had any activity in the last 1+ years. It will be closed in the next 90 days unless any other activity occurs or one of the following labels is added: “not stale”, “awaiting-bazeler”. Please reach out to the triage team (
@bazelbuild/triage
) if you think this issue is still relevant or you are interested in getting the issue resolved.Thanks for the details. @larsrc-google is the right person to look into this (after the holidays) and perhaps spawn a separate issue for
sandbox-exec
.@jakeleventhal We would love to make it faster, but I think this would require a fundamental redesign of the implementation on macOS. So far we’ve struggled to find a better mechanism in macOS than using
sandbox-exec
, with which we could implement faster and reliable sandboxing. The system just doesn’t seem to provide any good APIs for this. If you know of any or tools which implement sandboxing on macOS, please send us pointers!The most advanced sandboxing engine I know of is part of Microsoft’s BuildXL: https://github.com/microsoft/BuildXL/blob/master/Documentation/Specs/Sandboxing.md#macos-sandboxing but considering its complexity, so far no one has dared to look into if / how we could use it for Bazel.
We have been working to speed up sandboxing for our TypeScript Bazel build, which was originally often timing out at a 90 minute limit and frequently running at ~60 minutes (currently running without caching or remote build execution). We had identified the primary culprit as sandboxing slowness, which we observed both on Mac OS (our laptops) and Linux (our CI machines).
We had previously only enabled the new rules_nodejs
exports_directories_only
in ouryarn_install
, which dropped our TypeScript build down to 33-38 minutes, with occasional spikes of 55 minutes.Yesterday, I tried adding both
--experimental_reuse_sandbox_directories
and--experimental_sandbox_async_tree_delete_idle_threads=1
in our build and it seems to have a good impact on top of the rules_nodejsexports_directories_only
option. For our TypeScript build, this brings things down from 33-38 minutes with spikes of ~55 minutes to about 22-24 minutes steady (so far).Zooming out for context, here are the max and average runtime trends for these jobs over the past 13 weeks, which includes all of these changes:
@fenghaolw Could you try running with the
--experimental_reuse_sandbox_directories
flag and see if that speeds up the sandboxing sufficiently?Some tests I ran with this reproduction:
And corresponding observations:
--sandbox_debug
makes quite a bit of a difference. Deleting all the symlink trees is expensive, and this flag has the side-effect of not deleting them. But creating them is also quite expensive.--experimental_sandbox_async_tree_delete_idle_threads=auto
helps approximate the behavior of--sandbox_debug
and seems like a significant improvement over the current behavior. We should enable this new feature by default, but I remember seeing a crash recently that needs investigation…