pants: Remove bad `name=` default for targets
Currently, a/b/c really means a/b/c:c. This was a sensible default in a target-centric world, but is not ideal in Pants 2’s file-centric world:
- It confuses users who may not realize this nuance.
- It prevents directories from being used for something more meaningful.
This shorthand is used for CLI specs (./pants test a/b/c) and the dependencies field.
We should retcon directories == file paths, as initially done in https://github.com/pantsbuild/pants/pull/12262.
Proposal
Deprecate using a directory w/o :tgt at the end. For now, do not retcon directories to be file paths until we are completely rid of the old semantics. This should minimize confusion.
We’ll develop a script to automate fixing this for users.
Open question: default name for targets?
Imo, we should also deprecate having a default name for targets. If we remove the shorthand, you would always have to write a/b/c:c and not a/b/c. I think a/b/c:lib is easier to understand what the :tgt part is doing.
This should not add much boilerplate because ./pants tailor autogenerates the target names.
Instead of:
python_library()
python_tests(name="tests")
It will now be:
python_library(name="lib")
python_tests(name="tests")
This also makes things more consistent, that the python_library is less magical and it’s easier to intuit what is happening. We’d be able to simplify our target docs.
Not necessary: // prefix
In https://github.com/pantsbuild/pants/pull/12262#discussion_r663224379, it was proposed we could use // to disambiguate target specs. Instead, by removing the dir shorthand, :tgt becomes the disambiguator. Everything else is free to be a file spec or whatever we want.
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 57 (56 by maintainers)
@Eric-Arellano asked for my
The biggest thing in this proposal for me is teaching tailor how to edit targets in BUILD files instead of only adding them. I want to minimize manually editing BUILD files, and this helps with that. Requiring
name=makes the targets much clearer, as I had to read the docs several times to understand target names and how the default to the directory name. I think this should be an improvement.I’m also impressed with your willingness to seek the best experience even though that means break with the past and build migration tooling for the future.
TL;DR Automatically applying this shorthand to a manually declared dependency in a BUILD file caused a very odd chain of events to make two completely independent microservice Dockerfiles depend on one another in the Pants ecosystem.
I’ll preface this by saying I’m not caught up on the thread and the deeper rationale of why things are the way they are, but I did want to share an interesting problem I encountered because I didn’t understand the existing behavior. I’m a few months into using Pants and obviously still learning, but I hope this example goes to show how easy it is for this automatic behavior to cause issues when it’s more obfuscated from users.
I have a directory with multiple microservices that looks something like this (using silly names to simplify things)
In this example,
C.pyandA.pyboth depend onB.pydirectly. However, we don’t actually importB.pyinC.py; it’s needed more dynamically so we must declare the dependency in the build file.foo_svc/BUILD.pantslooked likewhile
bar_svc/BUILD.pantslooked likeBoth Dockerfiles are simple and just copy in the pex binaries like is usual for Pants systems. Here’s where it gets easy to mess up. I named the
docker_imagetargets the way I did so that they would show up in our registries by that name. This messes with Pants dependency mapping quite a bit. When I ran./pants package bar_svc:bar_svc, I happened to notice that it was packaging up the docker image forfoo_svceven though there was no way those were linked. I know this because of the way the system is architected and also by running./pants dependees foo_svc:foo_svcand rightfully seeing that there were none. It turns out that//foo_svc/B.pywas being turned into//foo_svc/B.py:foo_svcwithout me realizing it. This leads to a dependency chain that looks likebar_svc(docker image) ->C(pex) ->C.py(python) ->//foo_svc/B.py:foo_svc(Pants-modified dependency) ->foo_svc(docker image) ->A(pex) ->A.py(python) ->B.py(ah, finally the thing we wanted).The main problems this has are 1) it makes
foo_svcheavier than needed because we’re packaging up more of the repo with it and 2) it likely leads to a bunch of unnecessary cache invalidation when running tests since there’s a fake dependency link.I’ll admit that there are several ways to avoid this/fix it easily. The main thing I wanted to add by sharing this is that it’s not some contrived example; it was a real experience that I had to work through. Thankfully I know more now, it was an easy fix (change to
//foo_svc/B.py:sources), and I’m able to share more best practices with our team when constructing our BUILD files and repository layout. However, it was very easy to walk into and not realize I had done anything wrong.I love the minimalist take on boiler plate needed to use Pants; it’s why we adopted it internally. I’m very much in support of a feature like this simply to reduce some of the magic that happens that can cause real issues for build systems and deployable packages.
@Eric-Arellano Using
name="lib"as a default makes sense. I think your explanation addresses all my concerns - as long as there is good tooling to move from the current world to the new one. I’m not sure if you were suggesting that the actual names would incorporate the language but I’d resist that. Typingpython_libvslibevery time to support an edge case would be suboptimal. I think it’s acceptable that the first library type target gets thelibname and all others must be given custom names.Philosophically, I think the approach of explicit settings in generated config files vs implicit defaults helps make pants more intuitive to the end-user and less ‘magic’ (in a good way). Having said that, littering the code base with an arbitrary string like
libdoes feel wrong to the programmer in me - though I think I can get past that if the tooling deals with it smoothly.Here’s another confusing case when there is ambiguity:
./pants paths --from=path/to/foo.py --path/to/bar.pywhere foo.py belongs to two targets, one of which has the name of the directory, i.e., the “default” name.Because the
--fromspec is ambiguous,pathserrors withExpected a single target, but was given multiple targets. Did you mean one of these:and it lists the two addresses. However because of the spec canonicalization, one of those two is displayed aspath/to/foo.pywhich was what I originally used! Obviously that is a terrible error message to show people.I think we got off topic though. I agree with this proposal of deprecating directories == targets. It’s almost always going to end in failure, and we save only a few characters.
Thanks for the detailed use case! This sort of thing is super helpful when designing the UX.
I feel like I have been saying this over and over and over again… 😃
Yes, I’m referring to the deprecation and requiring changes to BUILD files. I don’t want to us to get into that right now. But changing tailor and updating the docs is easy.
Not saying there isn’t other work to be done to make directory specs happen, such as an option like the one you suggest…
My 2 cents about all of this:
The part I care about most is “directory specs”:
./pants goal path/to/dirshould do what the user expects it to do. I believe that the user expects this to behave like./pants goal path/to/dir::. This belief is grounded in the fact that every single Python tool we invoke behaves this way (I checked). And this affects every single user.The associated changes:
dependencies-[...], and disallowing depending on a directory-looking thingHave their own merits, which we are discussing here, but I believe we can make the directory specs change without them. In that case the cost of not doing 1) will be a discrepancy between what path/to/dir means on the cmd line vs. what it means in
dependencies. We can mitigate this discrepancy by simply not documenting that you can leave off the target name in that case… (and in any case, in a dep inference world people write few explicit dependencies), and 2) seems even less necessary for the purpose of directory specs.These also only affect that subset of users that has to edit BUILD files, which is why I think they are less critical.
PS The difference between 2015 and now, to my mind, is that back then you had to think about targets. Now we want users to think about targets as little as possible.
From a CLI perspective:
+1 for
path/to/dirto operate on all targets in dir, not justpath/to/dir:dir. Especially, as you could make something silly assrc/a/BUILDsrc/b/BUILDThen
./pants goal src/a src/bwouldn’t do the same for both dirs, which is weird.Ok - I was confused - I was thinking about this from the perspective of what you write in the BUILD file when trying to reference other targets as dependencies.
With this change would
path/to/dirmean “act on all the files under that directory” or “act on all targets that include the files under this directory”. If the ultimate behaviour isn’t actually identical to how the other tools you mention behave maybe forcing users into a different syntax actually avoids confusion? Maybe when you do./pants goal path/to/libyou could just get a helpful message telling you probably want./pants goal path/to/lib:lib?With respect to
path/to/lib:I was talking also about using that in the context of the BUILD file to specify dependencies as a shorthand forlib:lib. But now that you mention the significance of:in the CLI context I can see why supporting that wouldn’t make sense.I do think that being able to omit
name='lib'in apython_librarydefinition and have it default to the folder name is very useful and prevents problems from the folder name andnamegetting out of sync.Sorry, the sentence “It prevents directories from being used for something more meaningful” wasn’t very clear. What we’re referring to, specifically, is what should
./pants goal path/to/diron the command line mean. Today it means "applygoalto the targetpath/to/dir:dir. What we think we may want it to actually mean is “applygoalto all the code underpath/to/dir”.The reasons are:
We don’t want to force users to think in terms of targets. For example,
./pants lint path/to/dirwill today lint only the target nameddir(typically thepython_libraryfor that directory), not, for example, anypython_teststargets in that directory. In a files-and-directories-centric idiom, that is probably not what users will expect.All the underlying tools (pytest, flake8, mypy, etc etc) treat
path/to/diron the command line as “act on all the files under that directory”. And it seems prudent to be consistent.