markdown: Bold/Italic bug
I think I’m running up against another bold/italics bug. I did some quick searches and it looks like the other issues were considered resolved, sorry if I’m re-reporting on something already fixed that hasn’t made it upstream yet.
Installed from pip current Python-Markdown version 3.0.1
The raw markdown line that breaks is:
This is text **bold *italic bold*** with more text
The output I’m getting is as follows:
<p>This is text <strong>bold *italic bold</strong>* with more text</p>
However, the following format does seem to work correctly.
This is text ***bold italic** italic* more text
The output is
<p>This is text <em><strong>bold italic</strong> italic</em> more text</p>
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 27 (9 by maintainers)
Commits related to this issue
- Refactor em strong to consolidate code and fix issue #792 — committed to Python-Markdown/markdown by facelessuser 5 years ago
- Fix unit test regressions for Markdown rendering. Some of the recent releases of Python-Markdown, along with some of the recent fixes merged in from release-3.0.x, resulted in some regressions in the... — committed to reviewboard/reviewboard by chipx86 3 years ago
I think something like this will work for Python Markdown:
Here we are basically requiring that the content of each doesn’t start with the token, so if we had something like
***text*text***
or**text**text***
, we’d skip targeting them and let the other patterns deal with it. So we really only handle actual cases of**text*text***
.With underscore, we have smart enabled by default, so we have the additional requirement that the nested
_
is not preceded by a word character to continue with that “smart” logic. In thelegacy_em
extension, we’d replace this with the “dumb” logic.It seems to work with basic testing. I’ll upload a pull request once I’ve tested it more.
We could take it to the next step. I was being more conservative, but if we want to go all in and combine them, that could be done quite easily. Initially, we’d just use the new format and loop through our regular expression patterns and output the appropriate element based on what pattern matches. If we wanted to in the future, we could even rewrite to functionally parse the patterns (if there was some advantage), but I see no need to completely rewrite everything. I think the current patterns are probably fine for now, we can just group them into one pattern step.
Oops, apparently I copied the wrong example when I tested that. And Babelmark clearly indicates we are in the minority here. This is a bug.