markdown: Bold/Italic bug

I think I’m running up against another bold/italics bug. I did some quick searches and it looks like the other issues were considered resolved, sorry if I’m re-reporting on something already fixed that hasn’t made it upstream yet.

Installed from pip current Python-Markdown version 3.0.1

The raw markdown line that breaks is:

This is text **bold *italic bold*** with more text

The output I’m getting is as follows:

<p>This is text <strong>bold *italic bold</strong>* with more text</p>

However, the following format does seem to work correctly.

This is text ***bold italic** italic* more text

The output is

<p>This is text <em><strong>bold italic</strong> italic</em> more text</p>

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 27 (9 by maintainers)

Commits related to this issue

Most upvoted comments

I think something like this will work for Python Markdown:

# **strong*em***
EM_STRONG2_RE = r'(\*)\1(?!\1)(.+?)\1(?!\1)(.+?)\1{3}'

# __strong _em___
SMART_EM_STRONG2_RE = r'(?<!\w)(\_)\1(?!\1)(.+?)(?<!\w)\1(?!\1)(.+?)\1{3}(?!\w)'

Here we are basically requiring that the content of each doesn’t start with the token, so if we had something like ***text*text*** or **text**text***, we’d skip targeting them and let the other patterns deal with it. So we really only handle actual cases of **text*text***.

With underscore, we have smart enabled by default, so we have the additional requirement that the nested _ is not preceded by a word character to continue with that “smart” logic. In the legacy_em extension, we’d replace this with the “dumb” logic.

It seems to work with basic testing. I’ll upload a pull request once I’ve tested it more.

That said, I understand the concern about breaking the “step by 10” model we have now. In 3.0 we significantly altered the way inline parsing can work, while in practice we made very few actual changes. If you are suggesting taking this to the next step, then that seems like a reasonable approach. I like the idea that all strong processors are combined into one, if that is reasonable to accomplish.

We could take it to the next step. I was being more conservative, but if we want to go all in and combine them, that could be done quite easily. Initially, we’d just use the new format and loop through our regular expression patterns and output the appropriate element based on what pattern matches. If we wanted to in the future, we could even rewrite to functionally parse the patterns (if there was some advantage), but I see no need to completely rewrite everything. I think the current patterns are probably fine for now, we can just group them into one pattern step.

Oops, apparently I copied the wrong example when I tested that. And Babelmark clearly indicates we are in the minority here. This is a bug.