warehouse: /simple serves HTML that can't be parsed by Python's xml.etree if package has yanked releases

Describe the bug

Parsing HTML served by /simple endpoint results in xml.etree.ElementTree.ParseError.

Expected behavior

No parse error, as it was before when there were no yanked releases yet or with packages that don’t have any yanked releases (yet).

To Reproduce

  • Python script test.py that contains:

    import requests
    from xml.etree import ElementTree
    simple_pip = requests.get('https://pypi.python.org/simple/pip')
    ElementTree.fromstring(simple_pip.text)
    
  • run it with python test.py, for example (on macOS):

    $ python3 test.py
    Traceback (most recent call last):
      File "test.py", line 4, in <module>
        ElementTree.fromstring(simple_pip.text)
      File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/xml/etree/ElementTree.py", line 1315, in XML
      parser.feed(text)
    xml.etree.ElementTree.ParseError: not well-formed (invalid token): line 143, column 306
    

The problem is the data-yanked part in lines like:

<a href="https://files.pythonhosted.org/packages/8c/5c/c18d58ab5c1a702bf670e0bd6a77cd4645e4aeca021c6118ef850895cc96/pip-20.0.tar.gz#sha256=5128e9a9401f1d16c1d15b2ed766a79d7813db1538428d0b0ce74838249e3a41" data-requires-python="&gt;=2.7,!=3.0.*,!=3.1.*,!=3.2.*,!=3.3.*,!=3.4.*" data-yanked>pip-20.0.tar.gz</a><br/>

My Platform

  • macOS 10.15.4 with Python 2.7.16 or 3.7.7 (but same issue occurs on other platforms too)

Additional context

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 19 (9 by maintainers)

Most upvoted comments

This is now deployed, you can see it on https://pypi.org/simple/pip/ for example, for which I have manually purged the cache.