bazel: data dependency should not be required for building a target

Description of the problem / feature request:

Building a cc_test target also requires the data dependencies, even though the data dependencies are only needed for running/testing. We are using data dependencies for large test data and currently these large files need to be downloaded even when just building.

Feature requests: what underlying problem are you trying to solve with this feature?

Specify an external data dependency, and only have it downloaded when it’s actually needed.

Bugs: what’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.

$ cat WORKSPACE 
load("@bazel_tools//tools/build_defs/repo:http.bzl", "http_archive")
http_archive(
    name = "largefile",
    url = "https://github.com/bazelbuild/bazel/archive/refs/tags/6.0.0-pre.20220223.1.zip"
)

$ cat BUILD.bazel 
load("@rules_cc//cc:defs.bzl", "cc_test")
cc_test(
    name = "test",
    data = ["@largefile"],
    srcs = glob([
        "*.cpp",
        "*.hpp",
    ]),
)

$ cat x.test.cpp 
int main() {
   return 0;

$ bazel build test

Running the bazel command downloads the zip file, even though it’s not needed for the build.

What operating system are you running Bazel on?

Ubuntu 20.04

What’s the output of bazel info release?

release 5.0.0

I also tried it with today’s master, same behavior there (8dcf27e590ce77241a15fd2f2f8b9889a3d7731b)

If bazel info release returns “development version” or “(@non-git)”, tell us how you built Bazel.

What’s the output of git remote get-url origin ; git rev-parse master ; git rev-parse HEAD ?

Have you found anything relevant by searching the web?

According to the documentation data does not affect how the target is built:

A build target might need some data files to run correctly. These data files aren't source code: they don't affect how the target is built.

https://docs.bazel.build/versions/main/build-ref.html#data

Any other information, logs, or outputs that you want to share?

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 2
  • Comments: 20 (5 by maintainers)

Most upvoted comments

you’re asking build to be “smarter”, no?

Yes, correct!

How do you propose that should be achieved?

Disclaimer: I don’t know anything of the internals of Bazel so the following might be very silly / impossible to implement 😃

My general understanding of Bazel is that it first creates an action graph of all the actions needed to build a target. Once the graph is created, it checks what actions are cached and which ones need to run. Finally, all results are put together into a final output.

My proposal would be that the “build” action does not take into account the “data” edges when building the action graph. The “run”/“test” action would take them into account.

Then we have the use case of users who want to build with Bazel and run outside Bazel. Could the distinction between “deps” and “data” help here?

  • “deps”: dependencies will be fetched on all actions (“build”, “run”, “test”). This allows users to run outside Bazel.
  • “data”: dependencies will be fetched only in “run” and “test” actions. This is for users who want to run inside Bazel.

I fully agree that no parts of Bazel should be linked into the final binaries for the purpose of finding/fetching the data dependencies.

Why use a part of Bazel, a build tool

Well, Bazel is not just a build tool. Bazel has commands “run” and “test”, right? As a user, I expect Bazel to have custom implementations for “run” and “test” that are different / more appropriate / more optimized than for “build”, in particular when it comes to handling runtime data dependencies.

Besides, as I said above, cc_library exposes the specific “data” argument, and the docs state that it’s specifically designed for “runtime” dependencies. So, as a user, this tells me that Bazel makes some distinction about “build-time” vs “runtime” dependencies, since it has special constructs for it.

The same goes for the “aspects” documentation that I linked before. The picture literally displays how Bazel is smart enough to not consider the runtime dependencies into the aspect graph.

Bazel is also marketed as “One tool, multiple languages” for “build and test”. Not just “build”. “build and test”.

But yes, I understand that the dependency fetching is happening at an earlier stage, before Bazel can even decide what dependencies are used where. I just want to mention that this is very unintuitive from a user perspective. If I set up a cc_test with data, I only expect this data to be fetched if I intend to run the test. If I just want to build the test, data is not needed. Bazel should not be concerned with how users run binaries outside Bazel, IMHO.

I don’t think this can reasonably be implemented in Bazel. bazel build builds a binary that you can run without bazel itself. If runfiles (ie. data dependencies) were lazily fetched, Bazel itself would need to run somehow when the built target is run. If you really would like to fetch test data at runtime, you can certainly write that logic yourself.

@sgowroji My issue is the exact issue of @flode . I think we should reopen this issue.