bazel: Java crashes due to hsperfdata file conflicts across sandboxes
Running bazel build
on a fat java/scala project (several thousands of targets) fails when working on linux debian with user namespace enabled.
Issue
Trying to run bazel build
with user namespace enabled:
$ sysctl kernel.unprivileged_userns_clone=1
The build runs alright but at some point it crashes with weird memory issue:
ERROR: <target-path>/BUILD:35:1: error executing shell command: '
rm -rf bazel-out/local-fastbuild/bin/<package>/<target>.jar_temp_resources_dir
set -e
mkdir -p bazel-out/local-fastbuild/bin/<target>' failed: Process terminated by signal 6 [sandboxed].
#
# A fatal error has been detected by the Java Runtime Environment:
#
# SIGBUS (0x7) at pc=0x00007f094606874b, pid=5, tid=0x00007f09472e0700
#
# JRE version: (8.0_131-b11) (build )
# Java VM: Java HotSpot(TM) 64-Bit Server VM (25.131-b11 mixed mode linux-amd64 compressed oops)
# Problematic frame:
# V [libjvm.so+0x96874b] PerfMemory::alloc(unsigned long)+0x7b
#
# Core dump written. Default location: /home/builduser/.cache/bazel/_bazel_builduser/bc0e462ab01ac9379d22ad058ca1cb1f/bazel-sandbox/4864102460254154064/execroot/__main__/core or core.5
#
# An error report file with more information is saved as:
# /home/builduser/.cache/bazel/_bazel_builduser/bc0e462ab01ac9379d22ad058ca1cb1f/bazel-sandbox/4864102460254154064/execroot/__main__/hs_err_pid5.log
#
# If you would like to submit a bug report, please visit:
# http://bugreport.java.com/bugreport/crash.jsp
#
Environment info
The machine is docker container based on debian image
$ uname -a
Linux 167-docker99 3.16.0-4-amd64 #1 SMP Debian 3.16.43-2 (2017-04-30) x86_64 GNU/Linux
builduser@167-docker99:~/ws/bazel-port-isolation$ cat /etc/*-release
PRETTY_NAME="Debian GNU/Linux 8 (jessie)"
NAME="Debian GNU/Linux"
VERSION_ID="8"
VERSION="8 (jessie)"
ID=debian
HOME_URL="http://www.debian.org/"
SUPPORT_URL="http://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"Bazel version
- The project is fat (several thousands of java / scala targets)
- Bazel was built from a81264e1043dd90e984d9fcef5ce9962dce90d1d
- rules_scala - https://github.com/wix/rules_scala/commit/d66c9d7506ecc6a1b454055b35bd10ef064b9d98 (basically https://github.com/bazelbuild/rules_scala/commit/5d6ff512652b8b55f5d26f6ea69e05d86582d996 with small changes around
specs2
versions and test runner env preparation)
additional information
- issue does not happen when
unprivileged_userns_clone=0
(but clearly - that’s not a solution) - with user namespace enabled, bazel 0.5.1 showed this issue . May also be related to #3064 .
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 67 (56 by maintainers)
Commits related to this issue
- Add sandbox_tmpfs_path parameter by defaul in .bazelrc file. This is to avoid sporadic SIGBUS JVM crashes on highly parallel builds (like our CI). More details at https://github.com/bazelbuild/bazel/i... — committed to googleapis/googleapis by a-googler 4 years ago
- Squashed 'third_party/googleapis/' changes from 96d5a051..cd3ce265 cd3ce265 Dialogflow weekly v2 library update: - Minor comment updates. e94ad376 fix: point artman at gapic v1 for monitoring/v3 API ... — committed to nolanmar511/cloud-profiler-java by nolanmar511 4 years ago
- Add sandbox_tmpfs_path parameter by defaul in .bazelrc file. This is to avoid sporadic SIGBUS JVM crashes on highly parallel builds (like our CI). More details at https://github.com/bazelbuild/bazel/i... — committed to googleapis/python-bigquery-reservation by yoshi-automation 4 years ago
- Add sandbox_tmpfs_path parameter by defaul in .bazelrc file. This is to avoid sporadic SIGBUS JVM crashes on highly parallel builds (like our CI). More details at https://github.com/bazelbuild/bazel/i... — committed to googleapis/python-container by yoshi-automation 4 years ago
- Add sandbox_tmpfs_path parameter by defaul in .bazelrc file. This is to avoid sporadic SIGBUS JVM crashes on highly parallel builds (like our CI). More details at https://github.com/bazelbuild/bazel/i... — committed to andy0937/googleapis by a-googler 4 years ago
- Add sandbox_tmpfs_path parameter by defaul in .bazelrc file. This is to avoid sporadic SIGBUS JVM crashes on highly parallel builds (like our CI). More details at https://github.com/bazelbuild/bazel/i... — committed to googleapis/python-speech by yoshi-automation 4 years ago
- Set --sandbox_tmpfs_path=/tmp to avoid SIGBUS This workaround was suggested by philwo. The root cause is https://github.com/bazelbuild/bazel/issues/3236 Fixes #1116 — committed to fweikert/continuous-integration by fweikert 3 years ago
- Set --sandbox_tmpfs_path=/tmp to avoid SIGBUS (#1117) This workaround was suggested by philwo. The root cause is https://github.com/bazelbuild/bazel/issues/3236 Fixes #1116 — committed to bazelbuild/continuous-integration by fweikert 3 years ago
- Apply suggestions from code review Attempt to use the solution described in: https://github.com/bazelbuild/bazel/issues/3236#issuecomment-587752609. — committed to oppia/oppia-android by BenHenning 3 years ago
- [RunAllTests] Fix part of #2844: work around Bazel-specific CI issue that's causing an OpenJDK crash/flake (#2846) * Update unit_tests.yml Update Bazel to use JDK 9 in CI for building per https://... — committed to oppia/oppia-android by BenHenning 3 years ago
- WIP: Add --incompatible_sandbox_hermetic_tmp Work towards #3236 — committed to fmeum/bazel by fmeum 2 years ago
- Add --incompatible_sandbox_hermetic_tmp With the new flag, each Linux sandbox will have its own dedicated empty directory mounted as `/tmp` rather than sharing `/tmp` with the host filesystem. This ... — committed to bazelbuild/bazel by fmeum 2 years ago
- Add --incompatible_sandbox_hermetic_tmp With the new flag, each Linux sandbox will have its own dedicated empty directory mounted as `/tmp` rather than sharing `/tmp` with the host filesystem. This ... — committed to EdSchouten/bazel by fmeum 2 years ago
- Prevent random Java 11 crashes during Bazel executions Bazel and Java 11 can crash randomly due to the overload of files generated in the in-memory temporary directory. Use the well-known workaround ... — committed to GerritCodeReview/gerrit-ci-scripts by lucamilanesio 2 years ago
- Replace Path with RootedPath in SandboxHelpers and maintain a set of roots in SandboxHelpers. This will make it possible to replace the root of inputs easily. Work towards #3236. RELNOTES: None. Pi... — committed to bazelbuild/bazel by lberki 2 years ago
- Make the Linux sandbox work with ActionInputs with absolute "exec paths". Progress towards #3236. RELNOTES: None. PiperOrigin-RevId: 493542957 Change-Id: Iff396e77d7624bdb033b198068aa137397495db0 — committed to bazelbuild/bazel by lberki 2 years ago
- This change makes it possible to use the Linux sandbox when either the execroot, some package path entries, or both are under /tmp. This is achieved by a reshuffling of the sandbox directory layout i... — committed to bazelbuild/bazel by lberki 2 years ago
- Make the Linux sandbox work with ActionInputs with absolute "exec paths". Progress towards #3236. RELNOTES: None. PiperOrigin-RevId: 493542957 Change-Id: Iff396e77d7624bdb033b198068aa137397495db0 — committed to bazelbuild/bazel by lberki 2 years ago
- Flip `--incompatible_hermetic_sandbox_tmp` for Bazel Fixes spurious Java compile action failures when the action is sandboxed. Work towards #3236 — committed to fmeum/bazel by fmeum a year ago
- Flip `--incompatible_sandbox_hermetic_tmp` for Bazel Fixes spurious Java compile action failures when the action is sandboxed. Work towards #3236 — committed to fmeum/bazel by fmeum a year ago
Any updates on this? I started hitting this issue very often on all linux machines in CI.
My current workaround is to add the following to my
.bazelrc
Yeah, in fact, I have to stop myself from campaigning for the flag flip to go into Bazel 6.0 😃
Emotionally, I feel like “Bazel can’t reliably run a JVM in an action” is quite an embarrassing issue, although the fact that this bug has been open for more than five years seems to imply that it’s a less serious issue than my feelings say.
Unfortunately fixing the problem was accompanied by a warning message printed to STDOUT, which is breaking some of our build actions that write to STDOUT (https://github.com/google/google-java-format), and filling our build logs with hundreds of those warnings for all other JVM-tool actions. I’m not sure how we were not affected by the crash, but are now affected by the logging, but hopefully
--incompatible_sandbox_hermetic_tmp
will fix that new problem for us. We’re still on Bazel 5.3, but we’ll be sure to try this flag when we can.Update: now running the JVM in the sandbox should be stable with the
--incompatible_hermetic_sandbox_tmp
command line option with Bazel@HEAD (after 8e32f44)I’d appreciate if you gave it a try; we are planning to flip that flag eventually (right, @larsrc-google ?) and thus the more testing and in the more diverse environment, the better.
@philwo thanks, now I get it; for some reason I thought that the sandbox individually bind mounts the input files instead of symlinking.
I’d take special-casing Java in the sandbox code over Java randomly crashing in actions any day; that way, at least it’s only us who get to see the ugliness.
I was wondering if we could get away with a more-complicated-than-seems-necessary solution:
bazel-out/
and$OUTPUT_BASE/external/
to well-known locations in the sandbox (/bazel-workspace
,/bazel-out
,/bazel-external
or something, doesn’t really matter)/tmp
to an empty directory as aboveThen the sandbox would see consistent output paths, it would work if the output base or the workspace is under
/tmp
and it wouldn’t clash with anything on the “real” file system except with these/bazel-*
paths if someone is mad enough to have those on their file system or maybe in nested Bazel invocations (but even then, one could add a unique string per action to the path)However, IIRC runfiles trees contain absolute symlinks to their contents, so they would break if they are symlinked “naively”. It’s not an unsolvable problem because the remote execution strategy solves it, but it does require some extra thinking.
Why do we have a PID namespace in the sandbox?
The reason why this bug exists is that actions have separate PID namespaces but a mostly shared file system, which strikes me as odd: we either try to isolate actions as fully as possible (but then how come they share
/tmp
?) or we only try to protect against mostly-accidental hermeticity violations (but then why the PID namespace?)