markdown: Placeholder in output for fenced code blocks nested in raw HTML blocks

When using the markdown.extensions.extra, syntax looking like a code fence inside of HTML block elements gets rendered as gibberish instead of text, unless the HTML block has markdown=1 applied.

This test case demonstrates what I mean: test.md.txt

The output I get from parsing that page includes a section like this:

<div>

Save this file to `test.md` and then try this:


wzxhzdk:0


</div>

The exact contents of the gibberish change, but it typically seems to start with wz… which suggests to me that it’s rendering something that’s not meant to be a string as a string.

Same behavior on Python 3.5.1, using Python-Markdown 2.6.2 and Python-Markdown 2.6.5, as well as Python 2.7.11 using Python-Markdown 2.6.5. (On two different Arch Linux machines.)

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 2
  • Comments: 17 (9 by maintainers)

Commits related to this issue

Most upvoted comments

Some years ago I built a preprocessor which used a proper HTML parser under the hood. However, that HTML parser was the one in the python standard library which (at the time) crashed hard on invalid HTML with no way to recover. It solved a bunch of problems with the current approach, but I abandoned it because of the inability to handle invalid HTML (not to mention valid Markdown like <http://example.com>). More recently the standard library HTML parser has been updated and can handle those sorts of things relatively gracefully. Unfortunately, my old work is lost to time. I did try to recreate it and have just enough to convey the concept here: https://gist.github.com/waylan/84eadbf6873965886a16. It is very rough and has some obvious issues, but I suspect it could be made to work with less work that rebuilding the current preprocessor from scratch. If anyone is considering refactoring the 2.6 raw HTML handling, this may be a place to start.