pip: pip parses `dist-info-metadata` key from JSON indexes incorrectly
Description
When using a JSON-based package index, pip expects the dist-info-metadata
key to always have a string value and does not parse the values described by the spec.
dist-info-metadata
: An optional key that indicates that metadata for this file is available, via the same location as specified in PEP 658 ({file_url}.metadata
). Where this is present, it MUST be either a boolean to indicate if the file has an associated metadata file, or a dictionary mapping hash names to a hex encoded digest of the metadata’s hash.
pip tries to apply the same string parsing for this value as it does for the data-dist-info-metadata
value in PEP 658 and does not handle booleans or dicts.
This does not work as the entries according to PEP 691 look like:
{
"dist-info-metadata": {
"sha256": "8737b5dbb73f6b05e596c7c659f694a031de98b926e634716d9855fd365a5f2f"
},
"filename": "pre_commit-3.3.2-py2.py3-none-any.whl",
[...]
},
Expected behavior
pip should handle values with boolean or dict type as described in PEP 691.
pip version
Tested with 23.1.2 and current main
Python version
3.11
OS
Linux (debian bullseye)
How to Reproduce
To reproduce, we need access to a registry which supports PEP 691 JSON indexes and also PEP 658 metadata.
pypi.org is supposed to support both of these for new file uploads, but there is currently a bug where it provides the metadata under the wrong JSON key (https://github.com/pypi/warehouse/issues/13705), so it is hard to test at the moment. For now, the easiest way to reproduce is to temporarily patch your pip code to use the (incorrect) key to trigger the bug.
-
Temporarily patch your
pip
code so that it uses the key name provided by pypi.org:diff --git a/src/pip/_internal/models/link.py b/src/pip/_internal/models/link.py index e741c3283..5d1e0be4e 100644 --- a/src/pip/_internal/models/link.py +++ b/src/pip/_internal/models/link.py @@ -262,7 +262,7 @@ class Link(KeyBasedCompareMixin): url = _ensure_quoted_url(urllib.parse.urljoin(page_url, file_url)) pyrequire = file_data.get("requires-python") yanked_reason = file_data.get("yanked") - dist_info_metadata = file_data.get("dist-info-metadata") + dist_info_metadata = file_data.get("data-dist-info-metadata") hashes = file_data.get("hashes", {}) # The Link.yanked_reason expects an empty string instead of a boolean.
-
Try to install a package.
pip install --no-cache --only-binary :all: pre-commit==3.3.2
to reproduce pip trying to parse a metadata dict as a stringpip install --no-cache --only-binary :all: pre-commit==3.0.0
to reproduce pip trying to parse a booleanfalse
as a string
Output
$ pip install --no-cache --only-binary :all: pre-commit==3.3.2
Collecting pre-commit==3.3.2
ERROR: Exception:
Traceback (most recent call last):
File "/home/ckuehl/proj/pip/src/pip/_internal/cli/base_command.py", line 169, in exc_logging_wrapper
status = run_func(*args)
^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/cli/req_command.py", line 248, in wrapper
return func(self, options, args)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/commands/install.py", line 377, in run
requirement_set = resolver.resolve(
^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/resolver.py", line 92, in resolve
result = self._result = resolver.resolve(
^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/resolvers.py", line 546, in resolve
state = resolution.resolve(requirements, max_rounds=max_rounds)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/resolvers.py", line 397, in resolve
self._add_to_criteria(self.state.criteria, r, parent=None)
File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/resolvers.py", line 173, in _add_to_criteria
if not criterion.candidates:
File "/home/ckuehl/proj/pip/src/pip/_vendor/resolvelib/structs.py", line 156, in __bool__
return bool(self._sequence)
^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/found_candidates.py", line 155, in __bool__
return any(self)
^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/found_candidates.py", line 143, in <genexpr>
return (c for c in iterator if id(c) not in self._incompatible_ids)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/found_candidates.py", line 47, in _iter_built
candidate = func()
^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/factory.py", line 206, in _make_candidate_from_link
self._link_candidate_cache[link] = LinkCandidate(
^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 293, in __init__
super().__init__(
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 156, in __init__
self.dist = self._prepare()
^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 225, in _prepare
dist = self._prepare_distribution()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/resolution/resolvelib/candidates.py", line 304, in _prepare_distribution
return preparer.prepare_linked_requirement(self._ireq, parallel_builds=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/operations/prepare.py", line 510, in prepare_linked_requirement
metadata_dist = self._fetch_metadata_only(req)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/operations/prepare.py", line 374, in _fetch_metadata_only
return self._fetch_metadata_using_link_data_attr(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/operations/prepare.py", line 384, in _fetch_metadata_using_link_data_attr
metadata_link = req.link.metadata_link()
^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/models/link.py", line 417, in metadata_link
metadata_link_hash = LinkHash.parse_pep658_hash(self.dist_info_metadata)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ckuehl/proj/pip/src/pip/_internal/models/link.py", line 77, in parse_pep658_hash
name, sep, value = dist_info_metadata.partition("=")
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'dict' object has no attribute 'partition'
Same traceback for pre-commit==3.0.0
except the last line is instead
AttributeError: 'bool' object has no attribute 'partition'
Code of Conduct
- I agree to follow the PSF Code of Conduct.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 36 (35 by maintainers)
Commits related to this issue
- chore(deps): bump pip from 22.3.1 to 23.2 in /.github/workflows (#73) Bumps [pip](https://github.com/pypa/pip) from 22.3.1 to 23.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a h... — committed to klarna-incubator/mleko by ErikBavenstrand a year ago
- chore(deps): Update pip requirement from ~=23.1 to ~=23.2 in /packages/@jsii/python-runtime (#4182) Updates the requirements on [pip](https://github.com/pypa/pip) to permit the latest version. <detai... — committed to aws/jsii by dependabot[bot] a year ago
- chore(deps-dev): Bump pip from 23.1.2 to 23.2 in /packages/jsii-pacmak/test/generated-code (#4183) Bumps [pip](https://github.com/pypa/pip) from 23.1.2 to 23.2. <details> <summary>Changelog</summary>... — committed to aws/jsii by dependabot[bot] a year ago
- Bump pip from 23.1.2 to 23.2 (PR #6275) Bumps [pip](https://github.com/pypa/pip) from 23.1.2 to 23.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/pip/b... — committed to inmanta/inmanta-core by dependabot[bot] a year ago
- Bump pip from 23.1.2 to 23.2 (PR #6279) Bumps [pip](https://github.com/pypa/pip) from 23.1.2 to 23.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://github.com/pypa/pip/b... — committed to inmanta/inmanta-core by dependabot[bot] a year ago
- Bump pip from 23.1.2 to 23.2 in /.github/workflows (#22) Bumps [pip](https://github.com/pypa/pip) from 23.1.2 to 23.2. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://... — committed to boopmail/python-jmap by openfinch a year ago
- chore(deps): Bump pip from 23.1.2 to 23.2.1 (#1426) Bumps [pip](https://github.com/pypa/pip) from 23.1.2 to 23.2.1. <details> <summary>Changelog</summary> <p><em>Sourced from <a href="https://git... — committed to kai687/sphinxawesome-theme by dependabot[bot] a year ago
Well many people do use the bundled version of pip, I think all of the python docker container versions do, Homebrew, etc.
That being said, I think this bug has existed dormant in pip since at least 22.3, so we’re already well past the point where we can avoid having it “out in the wild”, so I don’t think today’s CPython releases matter much at all for our strategy here, other than it would have been nice to get a fixed version into those releases.
Agreed, I think (3) is the best option. I’m inclined to think we should produce a PEP, as there’s the possibility that other index providers have started to implement the existing spec - although it’s unlikely any have deployed it, as if they had, we’d have got bug reports for pip before now…
Just to be clear too, the above is true because of the specifics of this bug, but if pip preferred the HTML version, the same general problem could happen in reverse. There’s nothing specific to JSON here other than the bug happens to be in the JSON API on PyPI.
This sort of bug is honestly a risk anytime there’s some code path that isn’t being used, that suddenly starts being used.
I think there just isn’t a path forward for people who don’t upgrade pip, even without the backfill sooner or later most projects are going to end up with releases with
dist-info-metadata
.The good news is we should be able to turn on PyPI’s implementation of PEP 714 before pip lands their implementation, since the whole purpose of the strategy in the PEP is that it enables PyPI to land it without breaking the existing logic in pip. Not that much better tests would be a bad thing of course 😃 but I know how gnarly pip’s code base can be to test too, so there’s not a huge rush to land it in pip prior to PyPI anymore.
PEP 714: Rename dist-info-metadata in the Simple API
I think since (3) is probably our best option, I’m going to raise a topic on discuss.p.o.