poetry: Poetry doesn't try public pypi when private pypi included

  • [ X ] I am on the latest Poetry version.
  • [ X ] I have searched the issues of this repo and believe that this is not a duplicate.
  • [ X ] If an exception occurs when executing a command, I executed it again in debug mode (-vvv option).
  • macOS : 11.1
  • Poetry version: 1.1.5

Issue

When using a private pypi repo, the public repo is no longer being checked. I can get everything working again by adding

[[tool.poetry.source]]
name = "pypi-public"
url = "https://pypi.org/simple/"

But the wasn’t needed in the past.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 17
  • Comments: 35 (17 by maintainers)

Commits related to this issue

Most upvoted comments

I’m also finding this is an issue…

This issue isn’t reproducible on poetry@master. Closing.

Note that setting repository to default will disable default PyPI.

https://python-poetry.org/docs/master/repositories/#project-configuration

When you combine secondary = true and default = false, the poetry.lock does behave correctly.

e.g.:

[[tool.poetry.source]]
name = "internal"
url = "https://pypi.internal.com/simple/"
secondary = true
default = false

There’s no need to redefine the public pypi in this case. The poetry.lock correctly adds the private repo exclusively to the libraries I marked with source = "internal"

Our private registry is configured to redirect to pypi.org on missing packages. I think your test is flawed without a real repository, since both of your repositories a) contain all packages b) provide amazing performance. In the real world, you would be hitting your custom server first, which is probably slower than pypi.org.

I just tested locking a huge project. Locking with our registry first took 4m40, but configuring it as non default, secondary and targetting only the few relevant packages brought that down to 4m14 (that’s poetry lock --no-cache). In that test, there are 4 packages to fetch on our server, and 164 from pypi. There are also 2 git sources which slow down the whole process quite a bit. So overall, the difference here exists, but is clearly in the nice-to-have range.

The huge difference though, is that we were able to slash down the size of our private registry server with this one easy trick 😅 when too many clients are doing a poetry install simultaneously (think a bunch of docker builds and automated tests kicking in parallel), the server occasionally spits out a 5xx during poetry install. We were able to fix this problem by using the default false, secondary true trick.

I improved (I think) the documentation in #5605 but I think some effort should be made to support this use case better. The options are simply misleading for anyone who didn’t take the time to carefully read that documentation section.

This is the dark side of the rules:

  • adding a source means it’s checked first (it’s opinionated, not misleading. however you just moved a whole project from pypi to private now, probably unintentionally)
  • combining with source= at a dependency level has no effect (misleading)
  • setting secondary=true has no effect (misleading)
  • setting default=false has no effect (misleading)
  • setting default=true had no effect (misleading)
  • setting default=false and secondary=true finally makes source= work as intended! (profit)

We have to think about the developer’s thought process here. When you add source= the first time to a dependency, there’s very little chance that you’ll know that default=false and secondary=true must be added. If the repo information is added without a good understanding of its documentation, you’re not just adding a dependency, you’re actually setting the private registry as default for all locking and install needs. Since GitHub hides the poetry.lock diff most of the time (it’s too large), a lot of devs won’t notice the addition of 100s of package.source to their lock file and will just go with it. It took us a couple 5xx to understand what was going on…

I would say that if the user provided source= information, the plan is to use the private registry as little as possible. When the user sets the source=, poetry should use it only when requested. If no package have a source=, then the repository information is most likely to be used as much as possible.

However, this also has other issues. As discussed on the discord thread, eg: how do we handle transitive dependncies then?

Simply don’t. If the user wants a transitive to use the private registry, it can be added to the dependencies with the source= specified 🤷🏻‍♂️ That’s how someone could use e.g. a forked version of urllib3 even though only requests was needed. urllib3 would be pushed to the private registry, and urllib3 and source= would be added to the pyproject file. It feels wrong that the custom urllib3 will maybe be used by everyone in the company who didn’t think about this and set the private registry as default by mistake.

I think that setting a registry as the first one to be checked should be the “you need to specify an option” way, and using the registry only when source= targets it really should be the default scenario.

It adds extra requests during locking, yes, however “major slow-down” is probbaly an overstatement

has the situation significantly improved since the issue was filed? My recollection is it made a difference of one to two orders of magnitude in seconds for my project.

Ah, you might be right that it’s caused by the recursive dependencies. It’s been a while since I tested this so I don’t remember exactly, but I did see lookups in the private repo when setting everything to source pypi.

I’m affected by this. Tried in 1.2.0a2 for good measure… At first glance it looks fixed: poetry.lock no longer updates with the incorrect default repository.

However, it’s apparent that poetry still checks the non-default repository for every package even if a valid package has been found on the default one. This might be intended, but it also slows operations down significantly. Common scenario: App depends on a handful of packages in a private repository, and a bunch of packages in pypi. A quick poetry update -vvv will reveal what’s going on. In my case i’m using a private gitlab pypi repo and:

image

This makes poetry update a 35 second operation on a warm cache for my project. If I remove the custom repository, same project, it completes in under 1 second.

We were able to fix this problem by using the default false, secondary true trick.

default defaults to false for all sources, no need to explicitly set it. I am unclear on why you say that setting secondary=true had no effect. If it is indeed the case, then there is a bug unless default = true was explicitly configured.

You’re right, only secondary=true is needed. I think that was maybe an old bug, or just a manipulation error when I played around this months ago.

As an example if A depends on B and B depends on C, if we add C to A as you suggest and then later B drops dependency on C, you are left with an unused dependency in your project and/or your lockfile.

The same problem occurs when you need to specifically pin a version of a transitive dependency because reasons, no need to involve private registries to fall into this trap. I can’t vouch for everyone’s best practices, but if you need to add such an edge case to your pyproject, you comment it as such so that everyone knows what it’s about.

In fact, the same problem occurs if someone adds dependency A for new python code, then someone alters the code later and removes the usage. Unless you actively search the code base for more usages of some random import you just removed, you’re going to be left with one unused library. My opinion is that it’s a non-issue / user-error, this isn’t something poetry should be concerned about.

All that said, I’d suggest that we move off this issue for this discussion. Might be more constructive to discuss the change of “default” behaviour of adding a package source. Alternatively, discuss addition of an option disabling package searches unless explicitly used.

👍🏻

Poetry 1.1.12 and Windows, python 3.9. I just tested it again:

Given this:

[tool.poetry.dependencies]
python = ">=3.8"

requests = "*"


[[tool.poetry.source]]
name = "internal"
url = "https://pypi.my-company.com/simple/"
secondary = true
default = false

The lock doesn’t contain any repository information. But once I do this:

[tool.poetry.dependencies]
python = ">=3.8"

requests = { version = "*", source = "internal" }


[[tool.poetry.source]]
name = "internal"
url = "https://pypi.my-company.com/simple/"
secondary = true
default = false

Then the lock contains the repository information for the requests package exclusively.

Once I change to this:

[tool.poetry.dependencies]
python = ">=3.8"

requests = { version = "*" }


[[tool.poetry.source]]
name = "internal"
url = "https://pypi.my-company.com/simple/"
default = false

Then suddenly every single package in the lock contains my repository information (this seems to be a bug!).

This seems to work though:

[tool.poetry.dependencies]
python = ">=3.8"

requests = { version = "*" }


[[tool.poetry.source]]
name = "internal"
url = "https://pypi.my-company.com/simple/"
secondary = true

With the above, I don’t see any repo information. When I add source = "internal" then I get the same result as the first example: the repo information is added to the requests package and the other packages don’t have any repo information.

Will there be a fix supplied soon or some workaround? I am pretty stuck with trying to lock my deps with private repo and pypi.