markdoc: Support for inline and block HTML content
The commonMark spec describes how HTML blocks and Raw HTML should be treated:
4.6 HTML blocks
An HTML block is a group of lines that is treated as raw HTML (and will not be escaped in HTML output).
6.6 Raw HTML
Text between < and > that looks like an HTML tag is parsed as a raw HTML tag and will be rendered in HTML without escaping. Tag and attribute names are not limited to current HTML tags, so custom tags (and even, say, DocBook tags) may be used.
Example:
# Test <em>Emphasis</em>
<a href="http://github.com/markdoc/markdoc">GitHub</a>
Are there plans to support this aspect of the spec?
If so, it’s worth noting that other markdown implementations I’ve seen tend to follow the spec literally and do not parse HTML content in any way. In other words, raw HTML content is just treated as a string in the AST. The downside to this approach is that it prevents you from introspecting raw HTML content.
For example, if you wish to write a validator that ensures the integrity of links on a page you don’t really care whether the links are authored natively in markdown or as raw <a> tags. Likewise, when generating a table of contents, you want to generate IDs for and include all header tags.
Cheers, and congrats on the first release!
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 14
- Comments: 20 (8 by maintainers)
Experimental implementation here: https://github.com/markdoc/markdoc/compare/html-support
A few more use cases where it’s super useful to render unescaped HTML:
Custom syntax highlighting
Best-in-class highlighters like Shiki Twoslash render code (i.e. Markdoc fence content) to a string of HTML with highlighter classes and/or styles already applied.
Returning these raw HTML strings is a big DX improvement, instead of having to write additional code to parse, walk, and transform the HTML back into Markdoc nodes.
Live demo showing how to use Vite + Markdoc + Shiki Twoslash, returning a simple string of syntax highlighted HTML: ~https://stackblitz.com/edit/vitejs-vite-su1ym2?file=vite.config.ts~
Edit: updated that demo with a hacky workaround to support unescaped HTML for this case:
transformhttps://stackblitz.com/edit/vite-vue-3-markdoc-shiki-twoslash?file=vite.config.ts
Vue template generation
Since Vue templates are so similar to plain HTML, tools like Vitepress pass rendered Markdown (i.e. HTML strings) directly to the Vue template compiler. The rendered Markdown can include Vue components that are either written in the source
.mdfile, or written with custom Markdown syntax and inserted automatically by the Markdown renderer.See this project for an example of using a Markdown-It plugin to parse custom syntax and replace with Vue components.
Live demo showing how we could write Vue components in Markdown files if Markdoc supported unescaped HTML: https://stackblitz.com/edit/vite-vue-3-markdoc-shiki-twoslash?file=src%2Fhello-world.md
After a lot of debate and discussion about this particular feature, and after considering several different approaches to implementation, we have decided not to proceed with merging the PR.
We believe that the best way for users to natively support HTML content in Markdoc is to perform a transform on the token array between tokenization and parsing. This approach works significantly better and can be used today without requiring any changes to Markdoc. See the example here.
There are several major factors that have contributed to our decision to not move forward with the PR:
dangerouslySetInnerHTMLhas to be used on a parent element, which means there isn’t a good way to interleave arbitrary HTML string content around other React elements. The token transform approach bypasses this issue entirely and makes it possible to support the HTML content in React without having to rely ondangerouslySetInnerHTML.The good news is that the token transform approach is relatively straightforward for users to implement. This approach consists of several steps:
htmlsupport in the Markdoc tokenizerI have published a full working example of this here. In that example, each HTML tag in the content is translated into an
html-tagtag in the Markdoc AST—the HTML tag name and attributes become attributes on the Markdoc tag. You can simply write a customtransformfunction to control how the HTML tags are rendered, including emitting them literally in the output, which is what the example does.Given the significant advantages of this approach over the implementation we were considering in the PR, we hope that those of you who want HTML support will be satisfied with this and not be too frustrated by our decision not to move forward with the PR. We plan to publish documentation that describes all of this in more detail at some point in the future so that users who want HTML support will know how to proceed.
I appreciate all of the feedback and thoughts that everyone shared on this feature. I particularly want to thank @alex-sherwin, whose workaround heavily influenced the example that I shared.
@marshall007: I found a workaround with React here until the PR of @rpaul-stripe is merged.
I’ve worked around that issue by giving the ability to define the htmlWrapperTag by yourself.
Usage in .md file:
@rpaul-stripe I used the code from your demo, and it works perfectly, thank you! In my case, I had YouTube embeds in the markdown that I wanted to render and, as you can see on this page, your solution allows me to render those perfectly.
In my case, I was not able to use custom tag for those as the embed code is generated by an external system.
Sorry for the long delay on this. I updated the branch so that it can be merged and I did some additional testing to make sure that this will work as expected. There’s now a PR pending here that I hope to merge soon: https://github.com/markdoc/markdoc/pull/344
I also documented the proposed feature and drafted a formal RFP, which is here: https://github.com/markdoc/markdoc/discussions/343
The
markdown-itlibrary has an option (html: true) that can be enabled to get it to identify HTML content in Markdown according to the rules in the CommonMark specification. When this option is passed into the Markdoc tokenizer, a document with HTML content will fail to parse because Markdoc doesn’t define a corresponding HTML node type.It’s fairly trivial to add an HTML node type to
schema.tsso that we can capture these HTML strings and expose them in the Markdoc AST. We can add a new renderable tree type that is specifically for pass-through HTML content, and we can have the HTML node type output that during Markdoc’stransformphase.It’s straightforward to support this in the HTML renderer, because we can just append the content to the output string when we encounter it in the renderable tree. The problem, however, is figuring out a good way to support this in the React renderer.
We could have it use
dangerouslySetInnerHTML, but that requires the HTML content to be enclosed in another node of some kind. This works for standalone block or inline HTML content that is embedded in the document, but it’s going to break when the HTML content is interleaved with other Markdown content. We really need something like this feature in order to support it natively in React. There are third-party libraries like interweave that are probably viable for this now, but I think the end user will want control over it.What I’m leaning towards doing is making this just work in the HTML renderer and making it so that the React renderers accept an extra parameter with a callback that allows the user to control how raw HTML renderable tree nodes are handled. Then the user would have the option of using
Interweaveor doing whatever sanitization they want on the HTML during rendering.I think we’d follow the same approach by default, but it’s totally possible for a user to write an AST transform that walks over each HTML node in the AST, parses the string, and converts the actual markup to other Markdoc AST nodes.
@jerriep really sorry for the delay on this. It’s definitely still part of the roadmap. I will take another look at it this week and see if I can provide a clearer timeline for when this will be delivered.
@mauriciabad we actually want to solve the issue you point out with this solution: https://github.com/markdoc/markdoc/issues/156
@jerriep yes, I am working on wrapping it up this week.
I am also very interested in seeing this working. @rpaul-stripe Do you have any idea whether (and when) your experimental branch will be merged?