markdown: Bold/Italic bug

I think I’m running up against another bold/italics bug. I did some quick searches and it looks like the other issues were considered resolved, sorry if I’m re-reporting on something already fixed that hasn’t made it upstream yet.

Installed from pip current Python-Markdown version 3.0.1

The raw markdown line that breaks is:

This is text **bold *italic bold*** with more text

The output I’m getting is as follows:

<p>This is text <strong>bold *italic bold</strong>* with more text</p>

However, the following format does seem to work correctly.

This is text ***bold italic** italic* more text

The output is

<p>This is text <em><strong>bold italic</strong> italic</em> more text</p>

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 27 (9 by maintainers)

Commits related to this issue

Refactor em strong to consolidate code and fix issue #792 — committed to Python-Markdown/markdown by facelessuser 5 years ago
Fix unit test regressions for Markdown rendering. Some of the recent releases of Python-Markdown, along with some of the recent fixes merged in from release-3.0.x, resulted in some regressions in the... — committed to reviewboard/reviewboard by chipx86 3 years ago

Most upvoted comments

I think something like this will work for Python Markdown:

# **strong*em***
EM_STRONG2_RE = r'(\*)\1(?!\1)(.+?)\1(?!\1)(.+?)\1{3}'

# __strong _em___
SMART_EM_STRONG2_RE = r'(?<!\w)(\_)\1(?!\1)(.+?)(?<!\w)\1(?!\1)(.+?)\1{3}(?!\w)'

Here we are basically requiring that the content of each doesn’t start with the token, so if we had something like ***text*text*** or **text**text***, we’d skip targeting them and let the other patterns deal with it. So we really only handle actual cases of **text*text***.

With underscore, we have smart enabled by default, so we have the additional requirement that the nested _ is not preceded by a word character to continue with that “smart” logic. In the legacy_em extension, we’d replace this with the “dumb” logic.

It seems to work with basic testing. I’ll upload a pull request once I’ve tested it more.

facelessuser on Mar 2, 2019

That said, I understand the concern about breaking the “step by 10” model we have now. In 3.0 we significantly altered the way inline parsing can work, while in practice we made very few actual changes. If you are suggesting taking this to the next step, then that seems like a reasonable approach. I like the idea that all strong processors are combined into one, if that is reasonable to accomplish.

We could take it to the next step. I was being more conservative, but if we want to go all in and combine them, that could be done quite easily. Initially, we’d just use the new format and loop through our regular expression patterns and output the appropriate element based on what pattern matches. If we wanted to in the future, we could even rewrite to functionally parse the patterns (if there was some advantage), but I see no need to completely rewrite everything. I think the current patterns are probably fine for now, we can just group them into one pattern step.

facelessuser on Mar 4, 2019

Oops, apparently I copied the wrong example when I tested that. And Babelmark clearly indicates we are in the minority here. This is a bug.

waylan on Feb 27, 2019