bazel: data dependency should not be required for building a target
Description of the problem / feature request:
Building a cc_test
target also requires the data
dependencies, even though the data
dependencies are only needed for running/testing.
We are using data dependencies for large test data and currently these large files need to be downloaded even when just building.
Feature requests: what underlying problem are you trying to solve with this feature?
Specify an external data
dependency, and only have it downloaded when it’s actually needed.
Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
$ cat WORKSPACE
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
name = "largefile",
url = "https://github.com/bazelbuild/bazel/archive/refs/tags/6.0.0-pre.20220223.1.zip"
)
$ cat BUILD.bazel
load("@rules_cc//cc:defs.bzl", "cc_test")
cc_test(
name = "test",
data = ["@largefile"],
srcs = glob([
"*.cpp",
"*.hpp",
]),
)
$ cat x.test.cpp
int main() {
return 0;
$ bazel build test
Running the bazel command downloads the zip file, even though it’s not needed for the build.
What operating system are you running Bazel on?
Ubuntu 20.04
What’s the output of bazel info release
?
release 5.0.0
I also tried it with today’s master, same behavior there (8dcf27e590ce77241a15fd2f2f8b9889a3d7731b)
If bazel info release
returns “development version” or “(@non-git)”, tell us how you built Bazel.
–
What’s the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD
?
–
Have you found anything relevant by searching the web?
According to the documentation data
does not affect how the target is built:
A build target might need some data files to run correctly. These data files aren't source code: they don't affect how the target is built.
https://docs.bazel.build/versions/main/build-ref.html#data
Any other information, logs, or outputs that you want to share?
–
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 20 (5 by maintainers)
Yes, correct!
Disclaimer: I don’t know anything of the internals of Bazel so the following might be very silly / impossible to implement 😃
My general understanding of Bazel is that it first creates an action graph of all the actions needed to build a target. Once the graph is created, it checks what actions are cached and which ones need to run. Finally, all results are put together into a final output.
My proposal would be that the “build” action does not take into account the “data” edges when building the action graph. The “run”/“test” action would take them into account.
Then we have the use case of users who want to build with Bazel and run outside Bazel. Could the distinction between “deps” and “data” help here?
I fully agree that no parts of Bazel should be linked into the final binaries for the purpose of finding/fetching the data dependencies.
Well, Bazel is not just a build tool. Bazel has commands “run” and “test”, right? As a user, I expect Bazel to have custom implementations for “run” and “test” that are different / more appropriate / more optimized than for “build”, in particular when it comes to handling runtime data dependencies.
Besides, as I said above, cc_library exposes the specific “data” argument, and the docs state that it’s specifically designed for “runtime” dependencies. So, as a user, this tells me that Bazel makes some distinction about “build-time” vs “runtime” dependencies, since it has special constructs for it.
The same goes for the “aspects” documentation that I linked before. The picture literally displays how Bazel is smart enough to not consider the runtime dependencies into the aspect graph.
Bazel is also marketed as “One tool, multiple languages” for “build and test”. Not just “build”. “build and test”.
But yes, I understand that the dependency fetching is happening at an earlier stage, before Bazel can even decide what dependencies are used where. I just want to mention that this is very unintuitive from a user perspective. If I set up a cc_test with data, I only expect this data to be fetched if I intend to run the test. If I just want to build the test, data is not needed. Bazel should not be concerned with how users run binaries outside Bazel, IMHO.
I don’t think this can reasonably be implemented in Bazel.
bazel build
builds a binary that you can run without bazel itself. If runfiles (ie. data dependencies) were lazily fetched, Bazel itself would need to run somehow when the built target is run. If you really would like to fetch test data at runtime, you can certainly write that logic yourself.@sgowroji My issue is the exact issue of @flode . I think we should reopen this issue.