markdown-it-py: Space in link destination generates IndexError

Describe the bug

A space character right after a link destination scheme causes an IndexError.

[Contact](http:// mail.com)
[Contact](mailto: mail@mail.com)

Reproduce the bug

from markdown_it import MarkdownIt
MarkdownIt().parse("[Contact](mailto: mail@mail.com)")

Error

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/main.py", line 260, in parse
    self.core.process(state)
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/parser_core.py", line 33, in process
    rule(state)
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/rules_core/inline.py", line 10, in inline
    state.md.inline.parse(token.content, state.md, state.env, token.children)
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/parser_inline.py", line 120, in parse
    self.tokenize(state)
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/parser_inline.py", line 102, in tokenize
    ok = rule(state, False)
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/rules_inline/link.py", line 54, in link
    href = state.md.normalizeLink(res.str)
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/main.py", line 331, in normalizeLink
    return normalize_url.normalizeLink(url)
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/common/normalize_url.py", line 21, in normalizeLink
    parsed = mdurl.parse(url, slashes_denote_host=True)
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/mdurl/_parse.py", line 300, in url_parse
    u.parse(url, slashes_denote_host)
  File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/mdurl/_parse.py", line 204, in parse
    if rest[host_end - 1] == ":":
IndexError: string index out of range

List your environment

Python 3.9 markdown-it-py 2.0.1

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 6
  • Comments: 15 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Further investigation shows that markdown-it passes “http://” to mdurl. The javascript version actually handles the parsing of “http://” differently, then. By the fact that a negative index in javascript really does not generate an error, you get as parsing result:

url {protocol: 'http:', slashes: true, auth: null, port: null, hostname: '', pathname: null, search: null, hash: null}

In further processing this result is rejected as valid link (hostname must not be empty).

I can provide a PR for mdurl which mimics the javascript behavior …