warehouse: /simple serves HTML that can't be parsed by Python's xml.etree if package has yanked releases
Describe the bug
Parsing HTML served by /simple endpoint results in xml.etree.ElementTree.ParseError.
Expected behavior
No parse error, as it was before when there were no yanked releases yet or with packages that don’t have any yanked releases (yet).
To Reproduce
-
Python script
test.pythat contains:import requests from xml.etree import ElementTree simple_pip = requests.get('https://pypi.python.org/simple/pip') ElementTree.fromstring(simple_pip.text) -
run it with
python test.py, for example (on macOS):$ python3 test.py Traceback (most recent call last): File "test.py", line 4, in <module> ElementTree.fromstring(simple_pip.text) File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xml/etree/ElementTree.py", line 1315, in XML parser.feed(text) xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 143, column 306
The problem is the data-yanked part in lines like:
<a href="https://files.pythonhosted.org/packages/8c/5c/c18d58ab5c1a702bf670e0bd6a77cd4645e4aeca021c6118ef850895cc96/pip-20.0.tar.gz#sha256=5128e9a9401f1d16c1d15b2ed766a79d7813db1538428d0b0ce74838249e3a41" data-requires-python=">=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*" data-yanked>pip-20.0.tar.gz</a><br/>
My Platform
- macOS 10.15.4 with Python 2.7.16 or 3.7.7 (but same issue occurs on other platforms too)
Additional context
- EasyBuild bug report: https://github.com/easybuilders/easybuild/issues/619
- we’ve worked around this in the upcoming EasyBuild version by stripping out the
data-yankedpart (see https://github.com/easybuilders/easybuild-framework/pull/3303), but this issue still occurs in EasyBuild releases that worked fine perfectly before package releases were getting yanked
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 19 (9 by maintainers)
This is now deployed, you can see it on https://pypi.org/simple/pip/ for example, for which I have manually purged the cache.