omicron: Slow incremental build/test workflow.

tl;dr; A single-line change in a saga has a 10-minute build penalty for running tests. I’m developing on a 64-core machine with 256G of RAM using NVMe disks with trim enabled.

I run tests with cargo nextest run. I do not typically use -p as it’s not always clear what packages my changes may intersect with. In these cases, I feel like the build system should be the one figuring out what needs to be rebuilt and not the user. It’s also not clear why a change in a saga, which is seemingly near the top of the build pyramid, would cause a rebuild of unrelated testing packages.

Here is what a test run looks like with no-compilation, e.g. just running the tests.

real     3:32.627233210
user  3:35:25.998393895
sys   1:11:30.318353845
trap     2:22.183659850
tflt        4.635349996
dflt       47.150890796
kflt        0.837772914
lock 1751:11:53.945211231
slp  36:06:41.425071578
lat   3:53:12.177588657
stop  3:41:12.590832239

Here is a one-line change to saga code.

real    14:04.030041228
user  4:02:55.102264210
sys   1:07:44.130983296
trap     2:29.129798646
tflt        2.271531121
dflt       31.685564198
kflt        0.299503476
lock 1771:59:46.664672742
slp  38:27:07.709200291
lat   4:01:12.321380386
stop  3:54:09.085315193

Within that time the build time is the majority as reported by cargo.

Finished test [unoptimized + debuginfo] target(s) in 10m 29s

About this issue

Original URL
State: open
Created 10 months ago
Comments: 21 (12 by maintainers)

Commits related to this issue

Reduce debug level for a faster incremental build (#4026) Depends on https://github.com/oxidecomputer/omicron/pull/4025 Improves incremental rebuilds of Nexus by about ~50% for debug builds on Li... — committed to oxidecomputer/omicron by smklein 10 months ago

Most upvoted comments

I will now note that cargo bloat --time will now print

Note: prefer using cargo --timings.

So… yeah. As always these tools are mostly heuristics, useful for tracking down leads, but are also sometimes misleading.

steveklabnik on Sep 5, 2023

@jclulow I did find it a bit odd, as these packages never appeared in my other analyses. Then again I was focused on incremental compile-times of omicron-nexus, which means I don’t think it would have showed up in those graphs.

My fear with a lot of the stuff which is basically code generation (serde, lalrpop, diesel, etc) is that it can be extremely difficult to detect and correctly categorise the impact in at least two dimensions:

time during compilation; it might not have taken long to compile the proc macro itself, but that proc macro might run for a long time, or it might be very quick but produce an absolute torrent of expanded types that we then need to compile and subsequently discard
space in the resultant binary; much of the generated code is presumably not named (or otherwise marked) in a way that will make it easy to see that the proc macro created a vast and sprawling estate of program text

I also wonder, wrt. to our linker challenges, how many sections and symbols are produced in expensive code generation only then to be discarded during the expensive link step, getting us both coming and going as they say.

jclulow on Sep 5, 2023