bazel: Some `DexShardsToMerge`/`DexMerger` outputs do not get downloaded when using BwtB
Description of the bug:
When using Bazel 6.0 or later to build an android_binary
(with D8, not defining dex_shards
), a remote cache and setting --remote_download_toplevel
the builds usually fail on the MergeDexZips
action with errors like:
ERROR: ... Merging dex shards for //<foo>/app:app failed: (Exit 1): singlejar_local failed: ... No such file or directory
singlejar_local: src/tools/singlejar/input_jar.cc:23: Cannot open input jar bazel-out/darwin_arm64-fastbuild/bin/<foo>/dexfiles/app/3.shard.zip: No such file or directory
Looking at output files and timing profiles this seems to be because not all files of the dexfiles
tree artifact are created, either by being downloaded from the remote cache by running the action locally.
This tree artifact is created by the DexShardsToMerge
, which uses SpawnActionTemplate
to create DexMerger
actions. We can see this using aquery
:
SpawnActionTemplate with output TreeArtifact <foo>/app/dexfiles/app
Mnemonic: DexShardsToMerge
Target: //<foo>/app:app
Configuration: darwin_arm64-fastbuild
Execution platform: @local_config_platform//:host
Inputs: [bazel-out/darwin_arm64-fastbuild/bin/<foo>/app/dexsplits/app, bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/bazel_tools/tools/android/d8_dexmerger, bazel-out/darwin_arm64-opt-exec-2B5CBBC6/bin/external/bazel_tools/tools/android/d8_dexmerger.jar, bazel-out/darwin_arm64-opt-exec-2B5CBBC6/internal/_middlemen/external_Sbazel_Utools_Stools_Sandroid_Sd8_Udexmerger-runfiles]
Outputs: [bazel-out/darwin_arm64-fastbuild/bin/<foo>/app/dexfiles/app (TreeArtifact)]
It looks like this might happen when some files in dexfiles
tree artifact get a remote cache hit and others do not. The ones that do not get a hit get created locally, the ones that do get a hit never get downloaded.
What’s the simplest, easiest way to reproduce this bug? Please provide a minimal example if possible.
No response
Which operating system are you running Bazel on?
MacOS 13.2.1 & Ubuntu Focal Fossa
What is the output of bazel info release
?
release 6.0.0-dev
If bazel info release
returns development version
or (@non-git)
, tell us how you built Bazel.
This is a our patched version of the 6.0.0 release binary. We apply a few patches to improve performance of Android builds, fix dexing issues, (https://github.com/bazelbuild/bazel/pull/16369#issuecomment-1442802265) and some other issues. We build it with:
bazelisk build //src:bazel --compilation_mode=opt --embed_label="6.0.0-dev" --stamp
What’s the output of git remote get-url origin; git rev-parse master; git rev-parse HEAD
?
No response
Have you found anything relevant by searching the web?
The last commit on the 6.0.0 release branch fixed a seemingly related issue: https://github.com/bazelbuild/bazel/commit/86883db929bebcd1e6d0507ed4b8799e6e7591da.
There is an ongoing work on BwtB https://github.com/bazelbuild/bazel/issues/10880.
I have tried reverting that commit and creating the binary on top of recent master (5900d6d248), but ran into the same issue.
Any other information, logs, or outputs that you want to share?
In the timing profile I see that Remote.download
and Remote.downloadInMemoryOutput
get logged for the files that do not get created. There is a start time for these, but no wall duration.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (11 by maintainers)
This is probably related to https://github.com/bazelbuild/bazel/issues/16333. I’m working on a PR to handle partial trees correctly in the prefetcher; expect it to be submitted within a few days.
Great to hear! Yes, this fix will be in 6.2.
FYI, I suspect there’s one remaining correctness issue in the presence of “templated tree artifacts” (in an Android context, that only applies to dex merging) + BwtB, as hinted at by this comment. I don’t know how easy it is to trigger in practice, but I’m also planning to fix it for 6.2, regardless. Just wanted to mention it in case you come across some other weirdness in the future.
@lukaciko Could you try patching #17678 and let me know if it fixes this issue?