markdown-it-py: Space in link destination generates IndexError
Describe the bug
A space character right after a link destination scheme causes an IndexError.
[Contact](http:// mail.com)
[Contact](mailto: mail@mail.com)
Reproduce the bug
from markdown_it import MarkdownIt
MarkdownIt().parse("[Contact](mailto: mail@mail.com)")
Error
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/main.py", line 260, in parse
self.core.process(state)
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/parser_core.py", line 33, in process
rule(state)
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/rules_core/inline.py", line 10, in inline
state.md.inline.parse(token.content, state.md, state.env, token.children)
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/parser_inline.py", line 120, in parse
self.tokenize(state)
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/parser_inline.py", line 102, in tokenize
ok = rule(state, False)
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/rules_inline/link.py", line 54, in link
href = state.md.normalizeLink(res.str)
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/main.py", line 331, in normalizeLink
return normalize_url.normalizeLink(url)
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/markdown_it/common/normalize_url.py", line 21, in normalizeLink
parsed = mdurl.parse(url, slashes_denote_host=True)
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/mdurl/_parse.py", line 300, in url_parse
u.parse(url, slashes_denote_host)
File "/Library/Caches/pypoetry/virtualenvs/webapp-gmQ5g8dx-py3.9/lib/python3.9/site-packages/mdurl/_parse.py", line 204, in parse
if rest[host_end - 1] == ":":
IndexError: string index out of range
List your environment
Python 3.9 markdown-it-py 2.0.1
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 6
- Comments: 15 (11 by maintainers)
Commits related to this issue
- executablebooks/markdown-it-py#205 add test case — committed to mib112/executablebooks-mdurl by mib112 2 years ago
- executablebooks/markdown-it-py#205 fix test case — committed to mib112/executablebooks-mdurl by mib112 2 years ago
- executablebooks/markdown-it-py#205 fix — committed to mib112/executablebooks-mdurl by mib112 2 years ago
Further investigation shows that markdown-it passes “http://” to mdurl. The javascript version actually handles the parsing of “http://” differently, then. By the fact that a negative index in javascript really does not generate an error, you get as parsing result:
url {protocol: 'http:', slashes: true, auth: null, port: null, hostname: '', pathname: null, search: null, hash: null}In further processing this result is rejected as valid link (hostname must not be empty).
I can provide a PR for mdurl which mimics the javascript behavior …