bazel: Bazel clean --expunge or Bazel shutdown unable to kill stale bazel processes

Description of the problem:

The bazel buld or bazel query creates a stale bazel process even after the bazel build/query is completed. This prevents future invocation of other bazel commands

Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

On Tekton task list we are following below commands

  1. bazel query //… (or a list of targets)
  2. Once the query is completed, we are still seeing a bazel process and its child process seen running
    jenkins    2064      1 47 04:49 ?        00:03:07 bazel(directory) -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/home/jenkins/.cache/bazel/_bazel_jenkins/41b4626fb6512837d24f630cb1632ba8 --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -Xverify:none -Djava.util.logging.config.file=/home/jenkins/.cache/bazel/_bazel_jenkins/41b4626fb6512837d24f630cb1632ba8/javalog.properties -Dcom.google.devtools.build.lib.util.LogHandlerQuerier.class=com.google.devtools.build.lib.util.SimpleLogHandler$HandlerQuerier -XX:-MaxFDLimit -Djava.library.path=/home/jenkins/.cache/bazel/_bazel_jenkins/install/ba7765e6f39a679257358196b530585b/embedded_tools/jdk/lib/jli:/home/jenkins/.cache/bazel/_bazel_jenkins/install/ba7765e6f39a679257358196b530585b/embedded_tools/jdk/lib:/home/jenkins/.cache/bazel/_bazel_jenkins/install/ba7765e6f39a679257358196b530585b/embedded_tools/jdk/lib/server:/home/jenkins/.cache/bazel/_bazel_jenkins/install/ba7765e6f39a679257358196b530585b/ -Dfile.encoding=ISO-8859-1 -jar /home/jenkins/.cache/bazel/_bazel_jenkins/install/ba7765e6f39a679257358196b530585b/A-server.jar --max_idle_secs=10800 --noshutdown_on_low_sys_mem --connect_timeout_secs=120 --output_user_root=/home/jenkins/.cache/bazel/_bazel_jenkins --install_base=/home/jenkins/.cache/bazel/_bazel_jenkins/install/ba7765e6f39a679257358196b530585b --install_md5=ba7765e6f39a679257358196b530585b --output_base=/home/jenkins/.cache/bazel/_bazel_jenkins/41b4626fb6512837d24f630cb1632ba8 --workspace_directory=/home/jenkins/13518/directory --default_system_javabase=/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.262.b10-0.el7_8.x86_64 --failure_detail_out=/home/jenkins/.cache/bazel/_bazel_jenkins/41b4626fb6512837d24f630cb1632ba8/failure_detail.rawproto --deep_execroot --expand_configs_in_place --idle_server_tasks --write_command_log --nowatchfs --nofatal_event_bus_exceptions --nowindows_enable_symlinks --client_debug=false --product_name=Bazel --noincompatible_enable_execution_transition --option_sources=connect_Utimeout_Usecs:/home/jenkins/13518/directory/.bazelrc:max_Uidle_Usecs:/home/jenkins/13518/directory/.bazelrc
jenkins   11863   2064 62 04:52 ?        00:02:13 /home/jenkins/.cache/bazel/_bazel_jenkins/41b4626fb6512837d24f630cb1632ba8/execroot/com_ibm_monorepo/external/remotejdk11_linux/bin/java -XX:+UseParallelOldGC -XX:-CompactStrings --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --patch-module=java.compiler=external/remote_java_tools_linux/java_tools/java_compiler.jar --patch-module=jdk.compiler=external/remote_java_tools_linux/java_tools/jdk_compiler.jar --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -jar external/remote_java_tools_linux/java_tools/JavaBuilder_deploy.jar --persistent_worker
jenkins   11866   2064 51 04:52 ?        00:01:50 /home/jenkins/.cache/bazel/_bazel_jenkins/41b4626fb6512837d24f630cb1632ba8/execroot/com_ibm_monorepo/external/remotejdk11_linux/bin/java -XX:+UseParallelOldGC -XX:-CompactStrings --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --patch-module=java.compiler=external/remote_java_tools_linux/java_tools/java_compiler.jar --patch-module=jdk.compiler=external/remote_java_tools_linux/java_tools/jdk_compiler.jar --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -jar external/remote_java_tools_linux/java_tools/JavaBuilder_deploy.jar --persistent_worker
jenkins   11877   2064 66 04:52 ?        00:02:21 /home/jenkins/.cache/bazel/_bazel_jenkins/41b4626fb6512837d24f630cb1632ba8/execroot/com_ibm_monorepo/external/remotejdk11_linux/bin/java -XX:+UseParallelOldGC -XX:-CompactStrings --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --patch-module=java.compiler=external/remote_java_tools_linux/java_tools/java_compiler.jar --patch-module=jdk.compiler=external/remote_java_tools_linux/java_tools/jdk_compiler.jar --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -jar external/remote_java_tools_linux/java_tools/JavaBuilder_deploy.jar --persistent_worker
jenkins   11879   2064 59 04:52 ?        00:02:07 /home/jenkins/.cache/bazel/_bazel_jenkins/41b4626fb6512837d24f630cb1632ba8/execroot/com_ibm_monorepo/external/remotejdk11_linux/bin/java -XX:+UseParallelOldGC -XX:-CompactStrings --add-exports=jdk.compiler/com.sun.tools.javac.api=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.code=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.comp=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.main=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.tree=ALL-UNNAMED --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNAMED --add-opens=jdk.compiler/com.sun.tools.javac.file=ALL-UNNAMED --patch-module=java.compiler=external/remote_java_tools_linux/java_tools/java_compiler.jar --patch-module=jdk.compiler=external/remote_java_tools_linux/java_tools/jdk_compiler.jar --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.lang=ALL-UNNAMED -jar external/remote_java_tools_linux/java_tools/JavaBuilder_deploy.jar --persistent_worker
jenkins   16288   1993  0 04:56 ?        00:00:00 grep bazel

We are unable to stop these processes, As per this we added a

bazel shutdown

That didn’t shut down any. We got this error:

WARNING: Waiting for server process to terminate (waited 5 seconds, waiting at most 60)
WARNING: Waiting for server process to terminate (waited 10 seconds, waiting at most 60)
WARNING: Waiting for server process to terminate (waited 30 seconds, waiting at most 60)
INFO: Waited 60 seconds for server process (pid=2064) to terminate.
WARNING: Waiting for server process to terminate (waited 5 seconds, waiting at most 10)
WARNING: Waiting for server process to terminate (waited 10 seconds, waiting at most 10)
INFO: Waited 10 seconds for server process (pid=2064) to terminate.
FATAL: Attempted to kill stale server process (pid=2064) using SIGKILL, but it did not die in a timely fashion.

The bazel clean --expunge also shows the same error.

What operating system are you running Bazel on?

Redhat 7.9 Docker container running in K8S pod (as a Tekton task)

What’s the output of bazel info release?

Extracting Bazel installation… Starting local Bazel server and connecting to it… release 3.2.0

If bazel info release returns “development version” or “(@non-git)”, tell us how you built Bazel.

NA

What’s the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

Is it required?

Have you found anything relevant by searching the web

Followed this thread Included the

bazel shutdown  

command, but it didn’t stop the existing bazel processes.

Any other information, logs, or outputs that you want to share?

Will share further if required.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 3
  • Comments: 21 (6 by maintainers)

Commits related to this issue

Most upvoted comments

Just hit the same problem in our CI/CD pipeline. The problem was yes, the lack of an init process / child reaper.

What happens:

  1. bazel shutdown or any bazel command that requires killing/restarting the bazel daemon will use kill($serverPid) to terminate the server.
  2. In a container, be it a k8 or plain docker, if PID 1 is not a process that will reap children (eg, waitpid for any child that dies), the bazel daemon with $serverPid will remain as a zombie once killed. From the OS point of view, the process with $serverPid will keep existing, both as a PID and as a file in /proc/$serverPid until a parent waitpids on it.
  3. As per code in src/main/cpp/blaze_util_posix.cc, the bazel command trying to kill the bazel servers keeps sending kill -TERM $serverPid or kill -9 $serverPid until … the pid goes away from /proc/$serverPid or until killd($serverPid, 0) returns error (depending on platform).
  4. Given that there is no child reaper, no init process… the zombie sticks around forever, the pid never goes away, and the command trying to kill bazel thinks the process is still running until eventually times out with the error in this bug.

Solution/fix: in your container, use an entrypoint that does child reaping. Eg, have PID 1 be /bin/docker-init, /sbin/init, or custom code. Alternatively, run something in the container that does child reaping via PR_SET_CHILD_SUBREAPER, like /bin/docker-init -s.