poetry: Selecting dependency source doesn't work for transitive dependencies

  • I am on the latest Poetry version.
  • I have searched the issues of this repo and believe that this is not a duplicate.
  • If an exception occurs when executing a command, I executed it again in debug mode (-vvv option). N/A
  • OS version and name: Ubuntu 18.04
  • Poetry version: 1.0.9

Issue

TLDR: Specifying individual dependencies source using the source field (see #908) is ignored for transitive dependencies. This greatly limits its usefulness and exposes users to supply chain attacks.

I have three private repos numpy, lib and app (I know I it would be unwise to call my package numpy, this is to illustrate the problem). lib depends on numpy and app depends on lib. They all live in acme corps private pypi repository at http://my-pypi/simple/.

I have set up my-pypi as the primary source in pyproject.toml for lib and app. To make sure thay my numpy is used by lib, its pyproject.toml includes this line:

numpy = { version = ">0", source = "my-pypi" }

This works fine. When I install lib using poetry install my private numpy package is installed as a dependency.

app depends on lib, app’s pyproject.toml includes this line:

lib = { version = ">0", source = "my-pypi" }

Now, when I run “poetry install” for app, the public PyPI version of numpy is installed! Is this expected and/or intended? As a user, I would expect app to use my private numpy in this case. If that isn’t the case, it should be clearly specified in the documentation. The current behavior exposes users to supply chain attacks which they might expect to avoid using the source field.

In fact, the source field isn’t documented anywhere although it is has been in poetry since v1.0.0. I found it described here: #908.


I believe this is may be a similar issue to #1356 but it concerns pure poetry workflows so I don’t think it is a duplicate.


lib’s pyproject.toml

[tool.poetry]
name = "lib"
version = "0.1.0"
description = "A library"
authors = ["me"]

[[tool.poetry.source]]
name = "my-pypi"
url = "http://my-pypi/simple/"

[tool.poetry.dependencies]
python = "^3.6"
numpy = {version = ">0", source = "my-pypi"}

app’s pyproject.toml

[tool.poetry]
name = "app"
version = "0.1.0"
description = "An application"
authors = ["me"]

[[tool.poetry.source]]
name = "my-pypi"
url = "http://my-pypi/simple/"

[tool.poetry.dependencies]
python = "^3.6"
lib = {version = ">0", source = "my-pypi"}

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 16 (6 by maintainers)

Most upvoted comments

I’d be happy enough to close this as wontfix

  • wanting exclusively to use a private repository is a common use case and is supported (that seems to be what today’s conversation is about)
  • wanting to get only some specific packages from a secondary repository is a common use case and more-or-less supported, albeit with improvements possible per #5984
  • wanting to name packages so that they collide with public packages, and not be explicit about what source they come from, but also relying on that working out in some particular way… yuk.
    • original report wanted their implicit numpy requirement to resolve to their private package, but if poetry did behave that way then some other user could just as legitimately raise a bug report saying that they wanted public numpy installed the whole time…

Of course the goal is not “the same as pip”, but as a rough level-set for what’s typical… pip supports

  • --index-url, approximately equivalent to a private repository with default = true
  • --extra-index-url, approximately equivalent to a secondary repository
  • (so far as I know) nothing remotely equivalent to what is being asked for here

I think I’m starting to understand what’s going on here – you don’t want to put a publishing target into pyproject.toml – it’s meant to be configured in poetry.toml or config.toml depending on if you use poetry config with the --local flag.

See https://python-poetry.org/docs/repositories/#publishable-repositories.

I do believe your existing use case is well-handled by a single source with default = True as you will fetch all of your deps from it, and your Local repository will take priority over the Remote PyPI repository with Artifactory.

FWIW, including less redacted URLs would have made it easier to figure out what was going on, but I think we got there. If there is any spot in the documentation that you think misled you to believe publishing targets were specified like package indexes, please let us know or submit a PR clarifying.

Remote+Local Artifactory is certainly a use case, but not the norm – most users are using small indexes with a subset of packages available to supplement PyPI. I don’t think we can force everyone over to your proposed new behavior for that reason. Likewise, adding a knob and supporting both seems a bit fraught.

I suppose that if we tried to implement such a feature, we might introduce a new source = {name = "name", recursive = true} optional syntax. I am quite nervous of the users who then ask us to allow opting out of this recursion on a per-package basis 😆

I do wonder, if you’re using a Virtual repository, why you can’t just set default = true and not use source = at all/be done? It seems like that would be by far the easiest solution.

If I have a project that depends on an internally published package that has numpy as a dep, the numpy transitive dep would have already been grabbed+cached; By using the same top level virtual source in the pyroject.toml for the parent dep, its not/shouldn’t be re-downloading all of a new numpy per-se, right?

Besides my concerns about the Artifactory workflow being hardly universal, I don’t see how this is the case. I think most users would find it more unexpected if source = was contagious – e.g. users of ML packages like torch that are distributed using an alternate index would find this to be the exact opposite of what is expected.