pants: Remove bad `name=` default for targets

Currently, a/b/c really means a/b/c:c. This was a sensible default in a target-centric world, but is not ideal in Pants 2’s file-centric world:

  • It confuses users who may not realize this nuance.
  • It prevents directories from being used for something more meaningful.

This shorthand is used for CLI specs (./pants test a/b/c) and the dependencies field.

We should retcon directories == file paths, as initially done in https://github.com/pantsbuild/pants/pull/12262.

Proposal

Deprecate using a directory w/o :tgt at the end. For now, do not retcon directories to be file paths until we are completely rid of the old semantics. This should minimize confusion.

We’ll develop a script to automate fixing this for users.

Open question: default name for targets?

Imo, we should also deprecate having a default name for targets. If we remove the shorthand, you would always have to write a/b/c:c and not a/b/c. I think a/b/c:lib is easier to understand what the :tgt part is doing.

This should not add much boilerplate because ./pants tailor autogenerates the target names.

Instead of:

python_library()
python_tests(name="tests")

It will now be:

python_library(name="lib")
python_tests(name="tests")

This also makes things more consistent, that the python_library is less magical and it’s easier to intuit what is happening. We’d be able to simplify our target docs.

Not necessary: // prefix

In https://github.com/pantsbuild/pants/pull/12262#discussion_r663224379, it was proposed we could use // to disambiguate target specs. Instead, by removing the dir shorthand, :tgt becomes the disambiguator. Everything else is free to be a file spec or whatever we want.

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 57 (56 by maintainers)

Commits related to this issue

Most upvoted comments

@Eric-Arellano asked for my

thoughts on a proposal to start always using name= for targets. It becomes much more important w/ generated target addresses.

The biggest thing in this proposal for me is teaching tailor how to edit targets in BUILD files instead of only adding them. I want to minimize manually editing BUILD files, and this helps with that. Requiring name= makes the targets much clearer, as I had to read the docs several times to understand target names and how the default to the directory name. I think this should be an improvement.

I’m also impressed with your willingness to seek the best experience even though that means break with the past and build migration tooling for the future.

TL;DR Automatically applying this shorthand to a manually declared dependency in a BUILD file caused a very odd chain of events to make two completely independent microservice Dockerfiles depend on one another in the Pants ecosystem.

I’ll preface this by saying I’m not caught up on the thread and the deeper rationale of why things are the way they are, but I did want to share an interesting problem I encountered because I didn’t understand the existing behavior. I’m a few months into using Pants and obviously still learning, but I hope this example goes to show how easy it is for this automatic behavior to cause issues when it’s more obfuscated from users.

I have a directory with multiple microservices that looks something like this (using silly names to simplify things)

> foo_svc
   > A.py
   > B.py
   > Dockerfile
   > BUILD.pants
> bar_svc
   > C.py
   > D.py
   > Dockerfile
   > BUILD.pants

In this example, C.py and A.py both depend on B.py directly. However, we don’t actually import B.py in C.py; it’s needed more dynamically so we must declare the dependency in the build file. foo_svc/BUILD.pants looked like

python_sources(name="sources")
pex_binary(name="A", entry_point="A.py")
docker_image(name="foo_svc")

while bar_svc/BUILD.pants looked like

python_sources(name="sources", overrides={"C.py": {"dependencies": [ "//foo_svc/B.py" ]}})
pex_binary(name="C", entry_point="C.py")
docker_image(name="bar_svc")

Both Dockerfiles are simple and just copy in the pex binaries like is usual for Pants systems. Here’s where it gets easy to mess up. I named the docker_image targets the way I did so that they would show up in our registries by that name. This messes with Pants dependency mapping quite a bit. When I ran ./pants package bar_svc:bar_svc, I happened to notice that it was packaging up the docker image for foo_svc even though there was no way those were linked. I know this because of the way the system is architected and also by running ./pants dependees foo_svc:foo_svc and rightfully seeing that there were none. It turns out that //foo_svc/B.py was being turned into //foo_svc/B.py:foo_svc without me realizing it. This leads to a dependency chain that looks like bar_svc (docker image) -> C (pex) -> C.py (python) -> //foo_svc/B.py:foo_svc (Pants-modified dependency) -> foo_svc (docker image) -> A (pex) -> A.py (python) -> B.py (ah, finally the thing we wanted).

The main problems this has are 1) it makes foo_svc heavier than needed because we’re packaging up more of the repo with it and 2) it likely leads to a bunch of unnecessary cache invalidation when running tests since there’s a fake dependency link.

I’ll admit that there are several ways to avoid this/fix it easily. The main thing I wanted to add by sharing this is that it’s not some contrived example; it was a real experience that I had to work through. Thankfully I know more now, it was an easy fix (change to //foo_svc/B.py:sources), and I’m able to share more best practices with our team when constructing our BUILD files and repository layout. However, it was very easy to walk into and not realize I had done anything wrong.

I love the minimalist take on boiler plate needed to use Pants; it’s why we adopted it internally. I’m very much in support of a feature like this simply to reduce some of the magic that happens that can cause real issues for build systems and deployable packages.

@Eric-Arellano Using name="lib" as a default makes sense. I think your explanation addresses all my concerns - as long as there is good tooling to move from the current world to the new one. I’m not sure if you were suggesting that the actual names would incorporate the language but I’d resist that. Typing python_lib vs lib every time to support an edge case would be suboptimal. I think it’s acceptable that the first library type target gets the lib name and all others must be given custom names.

Philosophically, I think the approach of explicit settings in generated config files vs implicit defaults helps make pants more intuitive to the end-user and less ‘magic’ (in a good way). Having said that, littering the code base with an arbitrary string like lib does feel wrong to the programmer in me - though I think I can get past that if the tooling deals with it smoothly.

Here’s another confusing case when there is ambiguity:

./pants paths --from=path/to/foo.py --path/to/bar.py where foo.py belongs to two targets, one of which has the name of the directory, i.e., the “default” name.

Because the --from spec is ambiguous, paths errors with Expected a single target, but was given multiple targets. Did you mean one of these: and it lists the two addresses. However because of the spec canonicalization, one of those two is displayed as path/to/foo.py which was what I originally used! Obviously that is a terrible error message to show people.

I think we got off topic though. I agree with this proposal of deprecating directories == targets. It’s almost always going to end in failure, and we save only a few characters.

Thanks for the detailed use case! This sort of thing is super helpful when designing the UX.

@stuhood and I were discussing how this ticket morphed into something it shouldn’t be: how to handle the default for name when defining targets, e.g. keeping the current default vs. address-target generators. That’s an important and related discussion, but not the same thing.

Instead, this ticket is about references to targets with the name left off, i.e. with CLI specs and in the dependencies field of other targets.

I feel like I have been saying this over and over and over again… 😃

Yes, I’m referring to the deprecation and requiring changes to BUILD files. I don’t want to us to get into that right now. But changing tailor and updating the docs is easy.

Not saying there isn’t other work to be done to make directory specs happen, such as an option like the one you suggest…

My 2 cents about all of this:

The part I care about most is “directory specs”: ./pants goal path/to/dir should do what the user expects it to do. I believe that the user expects this to behave like ./pants goal path/to/dir::. This belief is grounded in the fact that every single Python tool we invoke behaves this way (I checked). And this affects every single user.

The associated changes:

  1. Requiring explicit target names in dependencies-[...], and disallowing depending on a directory-looking thing
  2. Getting rid of default target names

Have their own merits, which we are discussing here, but I believe we can make the directory specs change without them. In that case the cost of not doing 1) will be a discrepancy between what path/to/dir means on the cmd line vs. what it means in dependencies. We can mitigate this discrepancy by simply not documenting that you can leave off the target name in that case… (and in any case, in a dep inference world people write few explicit dependencies), and 2) seems even less necessary for the purpose of directory specs.

These also only affect that subset of users that has to edit BUILD files, which is why I think they are less critical.

PS The difference between 2015 and now, to my mind, is that back then you had to think about targets. Now we want users to think about targets as little as possible.

From a CLI perspective:

+1 for path/to/dir to operate on all targets in dir, not just path/to/dir:dir. Especially, as you could make something silly as src/a/BUILD

python_library()
python_tests(name="tests")

src/b/BUILD

python_library(name="lib")
python_tests()

Then ./pants goal src/a src/b wouldn’t do the same for both dirs, which is weird.

Ok - I was confused - I was thinking about this from the perspective of what you write in the BUILD file when trying to reference other targets as dependencies.

With this change would path/to/dir mean “act on all the files under that directory” or “act on all targets that include the files under this directory”. If the ultimate behaviour isn’t actually identical to how the other tools you mention behave maybe forcing users into a different syntax actually avoids confusion? Maybe when you do ./pants goal path/to/lib you could just get a helpful message telling you probably want ./pants goal path/to/lib:lib?

With respect to path/to/lib: I was talking also about using that in the context of the BUILD file to specify dependencies as a shorthand for lib:lib. But now that you mention the significance of : in the CLI context I can see why supporting that wouldn’t make sense.

I do think that being able to omit name='lib' in a python_library definition and have it default to the folder name is very useful and prevents problems from the folder name and name getting out of sync.

Sorry, the sentence “It prevents directories from being used for something more meaningful” wasn’t very clear. What we’re referring to, specifically, is what should ./pants goal path/to/dir on the command line mean. Today it means "apply goal to the target path/to/dir:dir . What we think we may want it to actually mean is “apply goal to all the code under path/to/dir”.

The reasons are:

  1. We don’t want to force users to think in terms of targets. For example, ./pants lint path/to/dir will today lint only the target named dir (typically the python_library for that directory), not, for example, any python_tests targets in that directory. In a files-and-directories-centric idiom, that is probably not what users will expect.

  2. All the underlying tools (pytest, flake8, mypy, etc etc) treat path/to/dir on the command line as “act on all the files under that directory”. And it seems prudent to be consistent.