pants: Pants dependency calculation performance regression on 2.13.x

Describe the bug On 2.13.x dependee calculation is taking 2x - 3x as long as before (2.11.0) in my repo.

I’m not fully able to reproduce this effect outside of my specific repo conditions, which I can’t share. But I will share a debug log that shows some really odd timings.

https://pantsbuild.slack.com/archives/C046T6T9U/p1657128312628089

A command like

./pants --changed-since=HEAD --changed-dependees=transitive list

…which used to take ~30 sec now takes 1m30 or longer some times, and seems (anecdotally) to cache miss more frequently, when pantsd is cold. Can be easily and consistently reproduced by pkill pantsd then running the command again.

Debug logs reveal a minute long wait at some point

14:24:36.68 [DEBUG] Launching 1 roots (poll=false).
14:24:37.83 [DEBUG] Completed: Find all targets in the project
14:24:47.25 [DEBUG] Completed: acquire_command_runner_slot
14:24:47.25 [DEBUG] Running Searching for `python3.10` on PATH=/usr/bin under semaphore with concurrency id: 5, and concurrency: 1
14:24:47.25 [DEBUG] Completed: acquire_command_runner_slot
14:24:47.25 [DEBUG] Running Searching for `python3.9` on PATH=/usr/bin under semaphore with concurrency id: 6, and concurrency: 1
14:24:47.26 [DEBUG] Completed: Extracting an archive file
14:24:47.26 [DEBUG] Completed: pants.core.util_rules.external_tool.download_external_tool
14:25:48.97 [DEBUG] Completed: setup_sandbox
14:25:48.97 [DEBUG] Completed: setup_sandbox
14:25:48.97 [DEBUG] Obtaining exclusive spawn lock for process since we materialized its executable Some("./find_binary.sh").
14:25:48.97 [DEBUG] Obtaining exclusive spawn lock for process since we materialized its executable Some("./find_binary.sh").
14:25:57.63 [DEBUG] spawned local process as Some(122486) for Process { argv: ["./find_binary.sh", "python3.10"], env: {"PATH": "/usr/bin"}, working_directory: None, input_digests: InputDigests { complete: DirectoryDigest { digest: Digest { hash: Fingerprint<1824155fc3b856540105ddc768220126b3d9e72531f69c45e3976178373328f3>, size_bytes: 91 }, tree: "Some(..)" }, nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }, input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<1824155fc3b856540105ddc768220126b3d9e72531f69c45e3976178373328f3>, size_bytes: 91 }, tree: "Some(..)" }, immutable_inputs: {}, use_nailgun: {} }, output_files: {}, output_directories: {}, timeout: None, execution_slot_variable: None, concurrency_available: 0, description: "Searching for `python3.10` on PATH=/usr/bin", level: Debug, append_only_caches: {}, jdk_home: None, platform_constraint: None, cache_scope: PerRestartSuccessful }
14:25:57.86 [DEBUG] spawned local process as Some(122487) for Process { argv: ["./find_binary.sh", "python3.9"], env: {"PATH": "/usr/bin"}, working_directory: None, input_digests: InputDigests { complete: DirectoryDigest { digest: Digest { hash: Fingerprint<1824155fc3b856540105ddc768220126b3d9e72531f69c45e3976178373328f3>, size_bytes: 91 }, tree: "Some(..)" }, nailgun: DirectoryDigest { digest: Digest { hash: Fingerprint<e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855>, size_bytes: 0 }, tree: "Some(..)" }, input_files: DirectoryDigest { digest: Digest { hash: Fingerprint<1824155fc3b856540105ddc768220126b3d9e72531f69c45e3976178373328f3>, size_bytes: 91 }, tree: "Some(..)" }, immutable_inputs: {}, use_nailgun: {} }, output_files: {}, output_directories: {}, timeout: None, execution_slot_variable: None, concurrency_available: 0, description: "Searching for `python3.9` on PATH=/usr/bin", level: Debug, append_only_caches: {}, jdk_home: None, platform_constraint: None, cache_scope: PerRestartSuccessful }
14:25:58.23 [DEBUG] Completed: Searching for `python3.9` on PATH=/usr/bin
14:25:58.23 [DEBUG] Completed: Searching for `python3.10` on PATH=/usr/bin
14:25:58.23 [DEBUG] Completed: Scheduling: Searching for `python3.10` on PATH=/usr/bin
14:25:58.23 [DEBUG] Completed: Scheduling: Searching for `python3.9` on PATH=/usr/bin

Full debug logs attached below

Pants version 2.13.0a1

OS Linux: Fedora 34/35/36

Additional info Debug log, with specific filenames redacte

changed-dep.redacted.debug.log

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 16 (16 by maintainers)

Commits related to this issue

Improve performance of `Owners` rule (#16563) Possibly will close https://github.com/pantsbuild/pants/issues/16122. @stuhood found in https://github.com/pantsbuild/pants/issues/16122#issuecomment-... — committed to pantsbuild/pants by Eric-Arellano 2 years ago
Improve performance of `Owners` rule (#16563) Possibly will close https://github.com/pantsbuild/pants/issues/16122. @stuhood found in https://github.com/pantsbuild/pants/issues/16122#issuecomment-11... — committed to Eric-Arellano/pants by Eric-Arellano 2 years ago
Improve performance of `Owners` rule (#16563) Possibly will close https://github.com/pantsbuild/pants/issues/16122. @stuhood found in https://github.com/pantsbuild/pants/issues/16122#issuecomment-11... — committed to Eric-Arellano/pants by Eric-Arellano 2 years ago
Improve performance of `Owners` rule (Cherry-pick of #16563) (#16597) Possibly will close https://github.com/pantsbuild/pants/issues/16122. @stuhood found in https://github.com/pantsbuild/pants/is... — committed to pantsbuild/pants by Eric-Arellano 2 years ago
Improve performance of `Owners` rule (#16563) Possibly will close https://github.com/pantsbuild/pants/issues/16122. @stuhood found in https://github.com/pantsbuild/pants/issues/16122#issuecomment-... — committed to cczona/pants by Eric-Arellano 2 years ago

Most upvoted comments

For the record, the improvement is dramatic. Like we get 200-300% worse performance with content_only (the default) than we do with never (which I’ve now added in our default pants.toml config).

For the sake of clearing up my priorities… we can probably get by with --python-infer-init-files=never for a while, at least until we are able to move to 1:1:1. This will technically produce some incorrect dependencies but I think we can live with it for the time being.

If I get the time I can look into writing some test cases for non-1:1:1 scenarios if that’s a usage pattern we want to continue to support

jriddy on Jul 16, 2022

Based on the logs and trace, the source of the slow find_owners calls is likely to be the new --python-infer-init-files=content_only default combined with the fact that you have many targets defined at the root of the repo.

To confirm, please try with --python-infer-init-files=never.

The init files inference locates __init__.py files (in parent directories), and then finds owning targets (find_owners) for them (also in parent directories). In the common case of 1:1:1 the latter step is fairly cheap, because any particular file will have on the order of num_files_in_parent_directories possible candidates to match against. But because you have what sounds like all of the targets in your repository declared in the root directory, all targets are candidate owners of each __init__.py file. The end result is num_files_total * num_init_files_total non-batched glob matches.

EDIT: From investigation, 87% of 67% of the time here is actually in the parsing of the globs to apply, rather than in the actual matching. So I think that the answer is going to be to replace the stateless matches_filespec function with having @rules which prepare PathGlobs for particular SourcesFields (such that the native matcher preparation is memoized), and then have find_owners request the set of PathGlobs to match against.

stuhood on Jul 13, 2022

I meant trace level logging, which is what I was posting excerpts from. But I can give a try to a profiler sometime later. I’m on vacation atm so it may be going but I’ll keep trying to run this down

On Tue, Jul 12, 2022, 11:56 Stu Hood @.***> wrote:

Since we can’t rerpo outside of this specific repo, if you can give me a better idea of what is going on with these kinds of traces, I can do more digging myself.

The traces that we use via Toolchain are Zipkin/opentelemetry traces: they can’t really be reconstructed from the logs, unfortunately.

But #16141 https://github.com/pantsbuild/pants/pull/16141 was actually entirely py-spy. If you’re able to attach py-spy for the length of one of these runs, it might help point to something: py-spy record -p $pid -f speedscope -o ${output}.prof. Alternatively, if you’re familiar with any other Linux profiling tools that you can attach which would get more native stack frames (a minority of the runtime will be in pure rust code, but it’s still interesting), those would be useful.

— Reply to this email directly, view it on GitHub https://github.com/pantsbuild/pants/issues/16122#issuecomment-1181943551, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH5UH43QQTSEKXIIL5R6DTVTWIUPANCNFSM53FVAQFA . You are receiving this because you were mentioned.Message ID: @.***>

jriddy on Jul 12, 2022