jj: Slow operations on very large repos
Description
Right up front I want to acknowledge: (a) this is definitely an unusual situation, and (b) I totally get that it is likely to take a bit to sort through. But: I tried out Jujutsu on a very large repo from work a few minutes ago and found it’s distinctly not yet ready to use there:
| Command | Time |
|---|---|
jj init --git-repo=. |
4m 59s |
jj status |
25s |
(I’ll add more operations to this list once I’m actually back at work in August!)
For scale: this repo has on the order of 3M LOC checked in—primarily JavaScript, TypeScript, and Handlebars, but with a mix of Java and Gradle as well, with a massive node_modules directory and a not-small bucket of things related to Gradle (both gitignore’d buuuut still massive) and it has hundreds of thousands of commits in its history, hundreds of active branches… and, annoyingly, also hundreds of thousands of tags (one for each commit; better not to ask).
For comparison, git status takes a second or two (again, I will time them when I’m back at work). I’m not using a sparse checkout here (other folks sometimes do, but for various reasons it’s a non-starter for me 😩).
Comparable open source repos might be something like Firefox or Chrome? I tried DefinitelyTyped, and its 3M LOC and mere 84,275 commits only took 9s to initialize and jj status took around a second. Even so, the comparable scale of the codebase itself and dramatically better performance suggests there may be something repo-specific (the tags?) causing the issue.
Steps to Reproduce the Problem
- Check out a massive repo with
git. - Initialize it with
jj. - Run operations on it.
Expected Behavior
It completes in a reasonable amount of time.
Actual Behavior
It completes in what honestly probably is a reasonable amount of time given the sheer scale of the things, but in a way that makes it much worse than Git for the moment.
Specifications
- Platform: macOS Ventura 13.4.1
- Version: 0.7.0
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 37 (9 by maintainers)
Commits related to this issue
- docs: mention `git pack-refs` for co-located repos As suggested by @yuja in https://github.com/martinvonz/jj/issues/1841#issuecomment-1720451152 — committed to ilyagr/jj by ilyagr 10 months ago
- docs: mention `git pack-refs` for co-located repos As suggested by @yuja in https://github.com/martinvonz/jj/issues/1841#issuecomment-1720451152 — committed to ilyagr/jj by ilyagr 10 months ago
- docs: mention `git pack-refs` for co-located repos As suggested by @yuja in https://github.com/martinvonz/jj/issues/1841#issuecomment-1720451152 — committed to ilyagr/jj by ilyagr 10 months ago
- docs: mention `git pack-refs` for co-located repos As suggested by @yuja in https://github.com/martinvonz/jj/issues/1841#issuecomment-1720451152 Thanks to @lazywei for pointing out that `git pack-re... — committed to ilyagr/jj by ilyagr 10 months ago
- docs: mention `git pack-refs` for co-located repos As suggested by @yuja in https://github.com/martinvonz/jj/issues/1841#issuecomment-1720451152 Thanks to @lazywei for pointing out that `git pack-re... — committed to ilyagr/jj by ilyagr 10 months ago
- docs: mention `git pack-refs` for co-located repos As suggested by @yuja in https://github.com/martinvonz/jj/issues/1841#issuecomment-1720451152 Thanks to @lazywei for pointing out that `git pack-re... — committed to ilyagr/jj by ilyagr 10 months ago
- docs: mention `git pack-refs` for co-located repos As suggested by @yuja in https://github.com/martinvonz/jj/issues/1841#issuecomment-1720451152 Thanks to @lazywei for pointing out that `git pack-re... — committed to martinvonz/jj by ilyagr 10 months ago
- docs: mention `git pack-refs` for co-located repos As suggested by @yuja in https://github.com/martinvonz/jj/issues/1841#issuecomment-1720451152 Thanks to @lazywei for pointing out that `git pack-re... — committed to Dr-Emann/jj by ilyagr 10 months ago
If you have tons of refs under
.git/refsdirectory, trygit pack-refs. It will reduce the overhead of automated git imports.Excellent, I think I needed
core.fsmonitorin the repo too, I had it in my user config but I don’t think that helped. Now it’s working!Perhaps we could also point Watchman at the Git ref files/directories, so that we could at least skip importing refs when none of them have changed (or something more ambitious where we import refs selectively based on which files have changed).
If you’re curious what’s taking time, you can try profiling using e.g. samply. Just install with
cargo install samply, then run e.g.samply record jj logand open the link it prints. Feel free to share a screenshot.With #2232 merged, you should see significantly better performance in fresh clones of large repos. For example, I timed
jj log | head -1000in the Linux repo. That took ~13 s before and ~2.3 s after.I posted in Discord https://discord.com/channels/968932220549103686/969291218347524238/1129516951706816532 but should post here as well:
Here’s a
tracingprofile ofjj statusinnixpkgswith Watchman.Interesting segments:
snapshot: 257msimport_git_refs: 53mstree_state(reading it): 51msdeleting file states(filtering out Git submodules from a list of files): 14msmake_fsmonitor_matcher: 99msquery_watchman: 84ms (still a bit much in my opinion…)finish: 25mswrite_commit_summary: 10msconflicts: 424ms 😱cmd_statusisconflicts, so I’m guessing the remainder is someDropimplementation@martinvonz is working on tree-level conflicts which should take care of the biggest bottleneck. I think we can cut ~90ms if we stop storing file states in the tree-state proto for the Watchman case.
With some additional feature work, we could possibly reduce
import_git_refssomewhat by querying Watchman (might have to do it in parallel with snapshotting the working copy… actually, it would probably help to do them in parallel right now). The last 50ms of remaining time need more investigation. But then I think we could getstatusdown to an acceptable ~100ms.Whether you use
--ignore-working-copyis orthogonal to the availability of Watchman. It only means that working copy snapshots won’t be taken. If a snapshot is taken and Watchman is available, then jj will use Watchman as a faster path instead of scanning the filesystem.Make sure that you set
core.fsmonitortowatchmanin your repo as well (jj config set). You should be able to confirm that Watchman is being used for snapshotting by invokingjjwith the environment variableRUST_LOG=info. It should print a message saying that it is querying Watchman.