pip: Using multiple PIP indexes on the same hostname with different credentials does not work

Description

I have a need to simultaneously access two PIP indexes from the same hostname (pkgs.dev.azure.com) but using different credentials.

When configuring it like this:

PIP_INDEX_URL=https://build:password1@pkgs.dev.azure.com/feed1
PIP_EXTRA_INDEX_URL=https://build:password2@pkgs.dev.azure.com/feed2

pip seems to try credentials for feed2 for both feed1 and feed2 failing my builds.

I’ve worked around this for now by setting the same credentials for both feeds.

Expected behavior

feed1 credentials are used with feed1 and feed2 credentials are used with feed2

pip version

21.1.3

Python version

3.9

OS

linux

How to Reproduce

  1. Create two Azure feeds in different organizations, for example pkgs.dev.azure.com/org1/_packaging/org-feed/pypi/simple and pkgs.dev.azure.com/org2/_packaging/org-feed/pypi/simple
  2. Upload package1 to feed1, package2 to feed2
  3. Generate different personal access tokens PAT1 and PAT2 for the two feeds
  4. Set environment variables ``IP_INDEX_URL=https://build:PAT1@pkgs.dev.azure.com/org1/_packaging/org-feed/pypi/simple` and PIP_EXTRA_INDEX_URL=https://build:PAT2@pkgs.dev.azure.com/org2/_packaging/org-feed/pypi/simple
  5. Run pip install package1 package2

Output

pip interactively prompts for username breaking the build instead of installing the two packages.

Code of Conduct

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 29 (17 by maintainers)

Commits related to this issue

Most upvoted comments

Please to not close this issue.

It affects also users of gitlab where different groups in gitlab have their own package repositories, each using their own credentials.

It used to work, but no longer does (don’t know since what version of pip). It now forces us to use separate requirements files for a single project, since each file basically support one set of credentials.

I’ve only skimmed the discussion, but probably it should be caching anything feed related against the index URL as provided by the caller rather than just the netloc.

I noted this when implementing the keyring interface, and there’s logic somewhere that keeps track of the original index URL leading to a request for this purpose, but I didn’t have a reason to change the netloc-based caching at the time.

I probably made this possible with https://github.com/pypa/pip/pull/11698/files#diff-a88f002c8cad3308467fd2d7f55ae33f8e0538dfa275d1052797fbd93e0e3099R120, which is available since 23.1.

Works for me with pip 23.3.1. Thank you.

Where previously we had to do something like this (dependencies installed separately first):

pip install --extra-index-url "https://user1:token1@gitlab.com/api/v4/projects/12345/packages/pypi/simple" package1
pip install --extra-index-url "https://user2:token2@gitlab.com/api/v4/projects/23456/packages/pypi/simple" package2
pip install --extra-index-url "https://user3:token3@gitlab.com/api/v4/projects/34567/packages/pypi/simple" package3

We can now do this (install just the package we need with dependencies being pulled in by pip):

pip install \
--extra-index-url https://user1:token1@gitlab.com/api/v4/projects/12345/packages/pypi/simple \
--extra-index-url https://user2:token2@gitlab.com/api/v4/projects/23456/packages/pypi/simple \
--extra-index-url https://user3:token3@gitlab.com/api/v4/projects/34567/packages/pypi/simple \
package3

While the two options look very similar, the first one rarely worked properly and usually required specifying versions for the dependencies that matched exactly what package3 needed.

I probably made this possible with https://github.com/pypa/pip/pull/11698/files#diff-a88f002c8cad3308467fd2d7f55ae33f8e0538dfa275d1052797fbd93e0e3099R120, which is available since 23.1.

I tested it across two ADO organizations, with pip download. I am able to download packages from both feeds/indexes while the package does not exist on the other index.

This is an active problem affecting users in Azure DevOps.

I have faced this exact same issue this week, trying to connect to two feeds from the same Azure domain but different organizations. More specifically:

  • we connect to the 1st feed within the same organization, in which case the PipAuthenticate build task will generate a PAT at runtime
  • we connect to the 2nd feed with a service connection to an external organization using a user generated PAT

Workarounds for those coming from Azure DevOps: 1/ As OP mentioned, setting up the same credential for both feeds; in my case I would have to connect to the 1st feed using a service connection with the same user generated PAT as the 2nd feed; whether it is doable depends on if you can generated PATs that are valid cross-organization. 2/ Setting up the 2nd feed as an upstream source of the 1st feed; whether it is doable depends probably on permission levels 3/ Call two separate pip install commands, each one connecting to one feed only; whether it is doable depends on if you can afford to install packages in a certain order, i.e. install the dependencies first from one feed, then install the ones that need dependencies after that.

It used to work, but no longer does (don’t know since what version of pip). It now forces us to use separate requirements files for a single project, since each file basically support one set of credentials.

@pjstevns

The answer is here. Long story short, we (yes, I work at GitLab) deployed a security fix where we now enforce the credentials in the file download endpoints.

Due to the situation described in this issue (multiple indexes with different credentials), $ pip is sending the wrong credentials and the GitLab PyPI registry will not be happy about that (💥).

I described a possible workaround that might work depending on your situation: using a group deploy token where the target group contains all projects targeted by the index urls.

I’ve only skimmed the discussion, but probably it should be caching anything feed related against the index URL as provided by the caller rather than just the netloc.

Yep. That’s what my proposed PR #10904 does (and it also follows the RFC 3986 with regards to prefix-matching (though that part might not really by needed, I hardly imagine two pip repositories where one would be subpath of the other).

Note there is a bug here I believe, this does not compute the length of the matching prefix but the number of matching characters at the same position anywhere within the two strings:

Yeah. I fixed it in the latest fixup - and added a test there. This is now quite a bit simpler but also strictly follows the RFC (i.e. only full match of the URL witht authentication is used. Previously partial matches were also possible and feed2 could reuse authentication with feed1 which was not really correct as those are two different authentication scopes. The fix is simpler and more readable.

I’ve also proposed a documentation update and NEWS update to explain the change in behaviour

At least as I read it it is quite clear - this is a basic authentication, and ‘path’ is part of the authentication scope, so if a user matches “URL/path”. it does not match “URL/other_path”. At least this is what I read from it.

That’s how I interpret this as well but I don’t think it will solve the azure issue as mentioned above as it might transform URL/path to URL/better_path_for_azure (still not sure if it’s done by pip through redirects or HATEOAS).

Anyway, will try your PR tomorrow morning (CET) and report back.

Submitted #10904 - which should be pretty complete implementation of multi-domain matching (@roman-kouzmenko - you might want to install via pip install git+https://github.com/potiuk/pip.git@fix-behaviour-of-auth-information and check if my change works for you).

I believe it’s something that shoud be fixed on the pip side as Azure’s scheme is fully compliant with RFC7617 2.2.

Reusing Credentials

Given the absolute URI ([RFC3986], Section 4.3) of an authenticated request, the authentication scope of that request is obtained by removing all characters after the last slash (“/”) character of the path component (“hier_part”; see [RFC3986], Section 3). A client SHOULD assume that resources identified by URIs with a prefix-match of the authentication scope are also within the protection space specified by the realm value of that authenticated request.