tauri: [bug] Inflated build time due to heavy dependencies

Describe the bug

At times, I’ve gotten build times as large as 10 minutes. I’ve even gotten a 20 minute build time, but I’ve lost the cargo output to my bash terminal buffer.

To investigate, I set up some profile settings in my project to disable LTO and disable optimisation (in order to get the best case scenario), and then did a clean build using cargo clean && cargo +nightly build --timings.

image

The report shows that the three biggest hinderances are zstd-sys, blake3, and bzip2-sys.

Using cargo tree -i, I found the following

$ cargo tree -i zstd-sys
zstd-sys v1.6.3+zstd.1.5.2
└── zstd-safe v4.1.4+zstd.1.5.2
    └── zstd v0.10.0+zstd.1.5.2
        ├── tauri-codegen v1.0.0-rc.2
        │   └── tauri-macros v1.0.0-rc.2 (proc-macro)
        │       └── tauri v1.0.0-rc.3
        │           └── dsrbmm v0.1.0
        ├── tauri-utils v1.0.0-rc.2
        │   ├── tauri-build v1.0.0-rc.3
        │   │   [build-dependencies]
        │   │   └── dsrbmm v0.1.0
        │   ├── tauri-codegen v1.0.0-rc.2 (*)
        │   └── tauri-macros v1.0.0-rc.2 (proc-macro) (*)
        └── tauri-utils v1.0.0-rc.2
            ├── tauri v1.0.0-rc.3 (*)
            ├── tauri-runtime v0.3.2
            │   ├── tauri v1.0.0-rc.3 (*)
            │   └── tauri-runtime-wry v0.3.2
            │       └── tauri v1.0.0-rc.3 (*)
            └── tauri-runtime-wry v0.3.2 (*)

$ cargo tree -i blake3
blake3 v1.3.1
└── tauri-codegen v1.0.0-rc.2
    └── tauri-macros v1.0.0-rc.2 (proc-macro)
        └── tauri v1.0.0-rc.3
            └── dsrbmm v0.1.0

$ cargo tree -i bzip2-sys
bzip2-sys v0.1.11+1.0.8
└── bzip2 v0.4.3
    └── zip v0.5.13
        └── tauri v1.0.0-rc.3
            └── dsrbmm v0.1.0

As you can see, they all impact the compile time of Tauri, and this is the best-case scenario (no overhead of optimising or LTO to increase the compile time).

I know there’s nothing Tauri can do about the compile time of the libraries, but are there any lightweight alternatives that could replace these behemoths?

I do only have an Intel Core i7 7500U, so use that to put the performance in perspective. Only 4 cores, a maximum concurrency of 7, and 321 total compilation units. Even so, the fact that zstd-sys took half of the total build time is insane and is something that is worth at least looking further into. With those 3 hogging 3 CPU cores, I only had 1 thread remaining for everything else. And quad core isn’t exactly a niche setup

Reproduction

No response

Expected behavior

No response

Platform and versions

Operating System - Windows, version 10.0.19043 X64
Webview2 - 98.0.1108.56
Visual Studio Build Tools:
   - Visual Studio Build Tools 2019
WARNING: no lock files found, defaulting to npm

Node.js environment
  Node.js - 16.10.0
  @tauri-apps/cli - 1.0.0-rc.5
  @tauri-apps/api - Not installed

Global packages
  npm - 8.4.1
  pnpm - 6.29.1
  yarn - 3.1.1

Rust environment
  rustup - 1.24.3
  rustc - 1.59.0
  cargo - 1.59.0
  toolchain - stable-x86_64-pc-windows-msvc

App directory structure
/icons
/src
/target
/WixTools

App
  tauri - 1.0.0-rc.3
  tauri-build - 1.0.0-rc.3
  tao - 0.6.2
  wry - 0.13.3
  build-type - bundle
  CSP - default-src blob: data: filesystem: ws: wss: http: https: tauri: 'unsafe-eval' 'unsafe-inline' 'self' img-src: 'self'
  distDir - ../frontend/build/dist
  devPath - http://localhost:3000/
  framework - React

Stack trace

No response

Additional context

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 5
  • Comments: 18 (12 by maintainers)

Most upvoted comments

At the time of #1430 i had used the cargo timings to test compilation times in general. When I was running it, the linking time absolutely swamped everything else. On computers where it is more CPU constrained than IO constrained (like this issue’s example), these slow to build crates can become a larger issues.

Over time, it seems there have also been issues with longer compile times in some of these projects. Specifically zstd which seems to get slower to compile as versions go on while speeding up their runtime performance. Since zstd-sys is building the actual zstd c code, it seems to continuously get slower to compile over time. I don’t recall blake3 taking up nearly as much time as this timing output, so it may be the case that some cpu platforms are slower to build than others because it adds in cpu specific code for performance reasons.

Overall, I was not really concerned with the crate dependency compile times because I was much more focused on dirty builds (the dependencies have already been compiled) since that is the typical developer workflow loop. Additionally, I was not aware of how severe being CPU constrained (crate compilation) instead IO constrained (linking) could affect build times.

Along with the above reasoning, I chose zstd because:

  1. good rust bindings
  2. zstd has very good compression ratio and decompression speed/memory usage
  3. compatible licensing

I ended up adding blake3 in #1430 to prevent additional work from being performed if the asset file was not changed during a dirty build. This was much more important when compression was not an optional feature, as not only did it prevent IO work but it would also prevent CPU work. Nowadays, if compression is disabled it will only prevent IO work. This brought good wins for large asset files without extra dirty build time work. That said, I didn’t recall it having such an effect on a clean build time, but perhaps that is because I was not CPU constrained. I chose blake3 because:

  1. good rust bindings
    • the rust crate itself is the main project, although they also offer a c implementation
  2. compatible licensing
  3. great performance, especially great debug performance
    • they are also compiled with the same cargo profile of the main crate, causing debug builds to use debug-compiled proc-macros and their dependencies

The runtime performance was important because its runtime is also during compile time due to its use in tauri-codegen. It had to be faster to hash the file than writing out the contents to disk even with a debug profile.

tl;dr - I mostly focused on dirty build compile times (the dependency is already compiled) and otherwise focused on runtime performance, which zstd and blake3 brings plenty of.

There are multiple ways to go for solutions, I’ll start with blake3.

blake3 alternatives

blake3 actually comes with a Rust reference implementation. It is a single file with <400 LoC and no generics (from a quick glance) which usually translates to very fast compile times. I tested a release build on a Windows 10 VM with 2 CPU and it took 1s flat for a clean build and no caches. Downside, it is not published as a crate. The file changes very little (the logic changes even less) so vendoring this should be fine as long as we keep a bit of an eye on the original reference. We also may be able to convince the blake3 project in publishing it as a crate. Also it exists in a repo that is dual Apache-2.0/CC0 1.0 Universal but it itself is not specified and we may want to make sure it is also licensed the same.

I didn’t really look into anything else because this seemed good enough. I will add though, that if the slower performance doesn’t keep up wins for asset inclusion on HDDs (thinking of a JS project with megabytes of dependencies), we may also want to only enable if compression is also enabled since more work is done there.

The 1s compilation time for the reference can be compared to 13s for the main crate on the same machine, and 32.37s for the main crate with the rayon feature enabled. Perhaps just disabling rayon support will also bring a big-enough compilation time win with minimal performance impact.

As for runtime performance (6.7MB JavaScript file)…

method note time
b3sum (blake3 w/ rayon) 0.017s
b3sum (blake3 w/ rayon) --no-mmap 0.018s
b3sum (blake3 w/ rayon) --no-mmap --num-threads=1 0.02s
rust reference release 0.026s
rust reference debug 0.85s
rust reference debug opt-level = 1 0.03s
rust reference debug opt-level = 2 0.027s
rust reference debug opt-level = 3 0.026s
Click me to see the rust reference sum wrapper code
use crate::reference::Hasher;
use std::env::args_os;
use std::fmt::Write;
use std::fs::File;
use std::io::{self, Read};

mod reference;

fn copy_wide(mut reader: impl Read, hasher: &mut reference::Hasher) -> io::Result<u64> {
    let mut buffer = [0; 65536];
    let mut total = 0;
    loop {
        match reader.read(&mut buffer) {
            Ok(0) => return Ok(total),
            Ok(n) => {
                hasher.update(&buffer[..n]);
                total += n as u64;
            }
            Err(ref e) if e.kind() == io::ErrorKind::Interrupted => continue,
            Err(e) => return Err(e),
        }
    }
}

fn main() {
    let input = args_os().nth(1).expect("at least 1 argument of file path");
    let file = File::open(&input).expect("unable to open up input file path");
    let mut hasher = Hasher::new();
    let _copied = copy_wide(file, &mut hasher).expect("io error while copying file to hasher");
    let mut output = [0u8; 32];
    hasher.finalize(&mut output);
    let mut s = String::with_capacity(2 * output.len());
    for byte in output {
        write!(s, "{:02x}", byte).expect("cant write hex byte to hex buffer");
    }
    println!("{} {}", s, input.to_string_lossy());
}

As a sanity check against b3sum to make sure the output was the same, here is the output for both. Side note that the release build and the debug builds for the reference take a very similar amount of compile time (both ~1s) due to no dependencies. It may be worth it to enable an opt-level for it in the proc macro with a profile override (if we can do it in the proc-macro?) to change the runtime from 0.8s -> 0.03s. Not sure if the overrides work on the proc macro or only the root crate being built

PS C:\Users\chip\Documents\b3sum-reference> .\target\release\b3sum-reference.exe ..\..\Downloads\vendors_init.js
668031821b2ae54e9856e2b09fbc3403d5052567904fb76f47c9e2e42370bb18 ..\..\Downloads\vendors_init.js
PS C:\Users\chip\Documents\b3sum-reference> C:\Users\chip\.cargo\bin\b3sum.exe --no-mmap --num-threads=1 ..\..\Downloads\vendors_init.js
668031821b2ae54e9856e2b09fbc3403d5052567904fb76f47c9e2e42370bb18  ../../Downloads/vendors_init.js

Summary, if disabling rayon doesn’t give us enough compile-time gains on blake3, using the reference implementation is almost instant to compile, and only 50% slower (of a very fast runtime).

zstd alternatives

I started off building zstd in that same virtual machine (2 cpu). Clean build took only 13s which seems really low compared to the timings in this issue because that’s half the time blake3 build time took with rayon. The timings thinks it takes longer than blake3. Perhaps this is another issue that’s difficult to see for all computers.

I did get a warning while compiling the zstd-sys crate of warning: cl : Command line warning D9002 : ignoring unknown option '-fvisibility=hidden', but cargo test passes so I don’t believe it has an effect.

brotli

I first checked out brotli because that is actually what I used when first adding compression a long time ago. A clean build of brotli took 7s on the VM when ffi-api was disabled (compared to 12s). Dropbox’s implementation of brotli (the brotli crate) includes a few things over the base brotli project, most notably multithreading which brings it back into the performance ballpark of zstd.

This is looking promising, so I did some comparisons using the JS vendor file from https://app.element.io (6.7MB). These timings were taken on the same 2 cpu VM. Note that the brotli command used was the binary available on the rust crate. Note that brotli’s default profile is the same as best.

algorithm profile time size
none none 0s 6.7MB
zstd best 4s 1.42MB
brotli best 12.1s 1.35MB
brotli 10 6.8s 1.38MB
zstd default 0.07s 1.81 MB
brotli 2 0.08s 1.83MB
zstd 14 0.8s 1.54MB
brotli 9 0.8s 1.51MB

I actually really like brotli(9) here, since it’s still sub-second compression (and brotli has good decompression) along with a slightly lower file size than the compression-time equivalent of the zstd profile. I think using brotli(2) for debug builds and brotli(9) for release builds is a good balance. We can always somehow add a hurt-me-plenty option that uses best to try and crank out the last kilobytes of the assets at the cost of runtime (during compile time in the codegen) performance for those that really want it.

brotli would be my choice hands down at the replacement. Here’s reasons why:

  1. Rust implementation is well-maintained (by dropbox)
  2. It includes some optional stuff over base brotli including multi-threading
  3. Compatible licensing (I think… bsd 3-clause)
  4. Faster to compile than zstd along with really good sub-second compression options.

miniz_oxide

A Rust implementation of DEFLATE. Compiles clean in ~2.8s. I didn’t look into it further because compression ratio and decompression performance (32k window size) are not ideal. This doesn’t really knock it out as a contender, I just prefer the brotli solution first.