bazel: --action_env ignored when `cfg = "exec"` is used

Description of the bug:

We need to pass some environment variables like “$CPATH” to the compiler when building TensorFlow with Bazel. This is cumbersome itself and has led to hard-to-debug issues like https://github.com/bazelbuild/bazel/issues/12059 in the past already.

Now we again see failures caused by action-env values not passed to the compiler invocation in TensorFlow 2.8.4 which I tracked down to https://github.com/tensorflow/tensorflow/commit/07cbc7bb0bf899aac2bee5e21e1ba4eb40038682 which changes cfg = "host" to cfg = "exec"

What’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

Build TensorFlow 2.8.4 with Bazel 4.2.2 passing --action-env=CPATH and observe that it is not passed to some compiler invocations resulting in e.g.:

In file included from bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/core/framework/dataset_options.pb.cc:4:
bazel-out/k8-opt-exec-50AE0418/bin/tensorflow/core/framework/dataset_options.pb.h:10:10: fatal error: google/protobuf/port_def.inc: No such file or directory
   10 | #include <google/protobuf/port_def.inc>

So can you provide information on how to use --action-env (or similar) in such circumstances?

An explanation on what is actually being done with the change to “exec” from “host” would also be very welcome. In our case we are not cross-compiling so host, target and build machine are all the same.

I would clearly classify this behavior as a bug because the documentation states:

Specifies the set of environment variables available during the execution of all actions.

But obviously there are now actions where those are missing but they should be in “all actions”

Which operating system are you running Bazel on?

REHL 7

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 24 (17 by maintainers)

Commits related to this issue

Build tf_to_kernel in "exec" cfg. "host" cfg is deprecated, and "exec" cfg saves ~10% compile time on kernels like polygamma. PiperOrigin-RevId: 404540569 Change-Id: I2a8be13501a42dc9ad7fbee0ac20b6b... — committed to tensorflow/tensorflow by tpopp 3 years ago

Most upvoted comments

Seems my comment didn’t got saved so again the most important bits as multiple people spent days trying to figure out why TF started to fail to compile due to the wrong/unset env variables (I finally found the commit that broke/changed it by bisecting over 2000 commits):

We are in, what should be, the most simple situation: Build a software (TensorFlow) with Bazel on the machine it is supposed to run on. On other build systems it is a variation of configure && make && make install but for Bazel which by default deletes all environment variables we need to pass commandline options to have them set (this is about e.g. $CPATH and $LIBRARY_PATH)

Previously --action_env has mostly worked as it was documented as “environment variables available during the execution of all actions”. Above I linked the documentation of Bazel 4.2.2 which is the version of Bazel I’ve been using. @fmeum Mentioned that this is wrong. As behavior of Bazel changes considerably between versions it would be good to have correct documentation for each specific version and I would count this at least as a bug report against the 4.2.2 documentation if the observed behavior is intended and hence the new documentation (which doesn’t seem to be specific to a version?) is correct.

However even before the change in TF it was brittle as the used environment depended on many other factors, for example:

Use of tools vs exec_tools: https://github.com/tensorflow/tensorflow/pull/44901
Unset use_default_shell_env: https://github.com/tensorflow/tensorflow/pull/44549
Now the cfg attr

I don’t fully understand the difference between “host” and “exec” configurations, they sound very similar. Also the observed behavior surprises me: --action_env applies to “target” configurations per the new docs but reverting the TF commit (so changing cfg = "exec" back to cfg = "host") makes the environment variables be passed. But with cfg = "exec" I need to set them via --host_action_env. Why is that? The last part makes sense per the new docs but why --action_env does apply to “host” cfgs isn’t clear to me. Has this changed? Or is this the expected effect of --distinct_host_configuration=false? In which Bazel version will this be a no-op?

Next question is why use_default_shell_env isn’t set by default? The name implies that.

And finally I’d like to ask what the “correct” way would be to build something with Bazel on the machine it is to be run i.e. a local, **non-**crosscompilation build.

Things that come to mind:

Keep/Restore the behavior of --action_env applying to all actions and introduce a separate --target_action_env similar to --host_action_env
Have a flag like --no-distinct-configurations so that “host”, “target” and “exec” are all the same avoiding potential rebuilds and issues like we have seen with unexpected changes to action environments

I would also greatly appreciate an easy way to find out why a specific C++ file is compiled (or library linked) with specific env variables (i.e. in a specific configuration). E.g. it would help if --subcommands would show the configuration name of an action and there would be a way to query how & why a specific file is built. It was very confusing that a library was build with missing env variables but calling bazel build on the seemingly obvious target had the env variables and hence worked. Turned out that the library was build again as part of a dependency of a dependency of a tool which had somewhere cfg = "exec". But I haven’t found a way in the documentation to find that from a source tree and/or with bazel. So the only way I found feasible is find the commit which changed the behavior and check those changes.

I hope that helps and Merry Christmas! 🎅

Flamefire on Dec 24, 2022

Doing some spelunking, https://github.com/bazelbuild/bazel/issues/4008 looks pretty relevant - an old bug that looks like it’s asking for the same sort of thing. That was fixed by https://github.com/bazelbuild/bazel/pull/12122 which led to the current --action_env / --host_action_env split.

I’m quite confused by that sequence of events. #4008 is an issue that happens because a lot of rules in the ecosystem don’t support --action_env because they’re implemented with run_shell, which by default doesn’t use the action env (use_default_shell_env = False). In my mind the fix for that would be for run_shell to use the action env by default, not to introduce a --host_action_env.

uri-canva on Jan 17, 2023

I don’t think we should make --action_env apply to the exec configuration by default. I think similar to this issue https://github.com/bazelbuild/bazel/issues/13839 the inability to use flags that only target the target configuration causes issues.

keith on Jan 9, 2023