mkdocs: Header and navigation sidebars display incorrectly arbitrary markup
Hello!
Since when does the bug occur?
I recently upgraded my mkdocs package from 1.4.2 to 1.5.2, and I noticed this bug when…
- I don’t specify a page title in
mkdocs.ymlfile or in the page metadata (in yaml). - I use an emoji (HTML in general) in the first level section title in the markdown file:
# Intégration et Déploiement Continu sur :simple-gitlab: Gitlab
For example (with mkdocs-material theme and readthedocs theme):
… while it was displayed like that before:
More generally, some parts like the search page of readthedocs theme don’t handle well the HTML in title sections.
External links:
readthedocs theme pictures and bug explanations are from @squidfunk.
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Reactions: 2
- Comments: 76 (76 by maintainers)
Commits related to this issue
- Ensure page titles from rendered Markdown do not contain any markup While the `_ExtractTitleTreeprocessor` already extracts text from the etree, postprocessors may add more markup. Therefore, markups... — committed to waylan/mkdocs by waylan 6 months ago
- Ensure page titles from rendered Markdown do not contain any markup While the `_ExtractTitleTreeprocessor` already extracts text from the etree, postprocessors may add more markup. Therefore, markups... — committed to mkdocs/mkdocs by waylan 6 months ago
Here’s a summary of what’s happening with titles.
A title of a nav item can come from one of three sources, in this order of precedence.
From the
navconfig in mkdocs.yml: The string is directly passed to themes, which normally paste it as raw HTML (tags, entities and all) This was a bad historical decision, but at least there’s no ambiguity. May or may not be worth trying to change this.From the
metatitlein the document: Exactly the same as (1)From the first heading in the document (which is surely understood to be Markdown)
A title of a ToC item comes from any heading in the document:
As you can see, MkDocs 1.4 had “consistent” nav item titles in that it consistently put them as raw HTML even though in one of the contexts the source material is clearly Markdown.
Then in MkDocs 1.5 instead we got a different form of consistency: the way to determine a nav item title from a Markdown heading became the same as determining the ToC item title from a Markdown heading (though with subtle differences and a bug).
The original bug report correctly identifies this failure to remove some of the tags from ToC titles (highlighted in bold above). But the followup reply then totally wrongly claimed that MkDocs 1.4 was perfect and was stripping tags. But as you can see it really wasn’t stripping anything whatsoever, just not bothering to render Markdown or do anything at all otherwise.
In terms of the latest replies in this thread, they went in this direction: Modify the nav item determination in the following way:
And I actually no longer think that this is a feasible way to proceed unless we also have plans to rework how ToC item headings are determined. Also it’s otherwise quite reckless. Instead I’m more inclined to just fully unify these two behaviors (remember, the ToC item determination is only subtly different from current MkDocs 1.5 behavior for nav items), meaning that mainly just the bug at the beginning of this thread is to be fixed through better removal of tags. Other changes (such as autoescape for templates) become unnecessary then, and kind of a separate topic, to be considered on its own merits.
The fact that the first two ways to specify the nav item title do still produce unguarded raw HTML is a big pain, inconsistency and even danger, but maybe we should not be blinded by this. In some ways it is the bigger problem, but in ways of practicality it doesn’t show up much and we don’t have much choice but to keep existing behavior anyway.
It also seems to me that there is a high demand for a good implementation of this functionality:
“Take an etree element in the middle of Markdown rendering, finish rendering it and convert it to a reasonable string with HTML entities but without any HTML tags.”
I only now realize that the ‘toc’ extension implements exactly this by necessity and instead I had made two separate implementations of pretty much the same thing:
We’ve been talking how to safely strip tags after unstashing HTML, and that we may need a new implementation, but Python-Markdown has that already (though not sure if it’s 100% reliable and optimal)
Now, it might just seem like I’m saying that we should just switch every implementation to the one from the ‘toc’ extension. And that’s probably true. But it does have its own deficiencies!
It runs as a treeprocessor at priority 5 and yanks the content before treeprocessors with later priorities had a chance to run. Specifically the ‘smarty’ treeprocessor (at priority 2) is what I noticed to be a problem. Currently
foo --- barwith ‘smarty’ extension remains justfoo --- barin the ToC title. My implementation for the nav titles avoided this bug just by being at a later priority - it got the correctfoo — bar.I think this is pretty much just a bug to be reported on Python-Markdown.
I was also saying that ideally
<img alt="foo description" src="foo.png">would be replaced by “foo description” rather than by nothing. And ideally I’d like to see this in both the nav and toc titles.I’m only following this discussion loosely, but any approach taken to solve this problem must please, please, please not break the entire ecosystem but try to be entirely downward compatible, so the ecosystem can gradually adapt.
So, if a plugin moves to the next version that will include this change, it still works with a theme that’s on 1.5, vice versa. We already limited the version range of Material for MkDocs to <1.6, so we can carefully migrate to the next release, but others might not have done so. I got to be honest here: I expect this to break a lot over at Material for MkDocs because our project is probably the biggest downstream project of MkDocs, with dozens of templates and 12 potentially affected plugins. I’m happy to migrate to whatever solution the maintainers of MkDocs have selected to solve the problem reported here, but we and all of our 40k users need time for that, as well as the entire ecosystem. Material for MkDocs is one of the most used projects affiliated with MkDocs out there, and we try very hard to make and keep it easy for our awesome users.
Thank you very much for your understanding and cooperation.
@waylan By the way, at the moment it seems we’re converging on putting the full HTML into
title, right?i.e.
I think yes, enabling autoescape is for sure the future that we want to be in. Applying
Markupas appropriate will finally make it explicit what is HTML and what is text, and plugins will finally be able to properly put text or HTML into attributes as they like.What helps make this change graceful is that, even if a plugin does need to make an edit in response to this, it is always possible to make such an edit without dropping compatibility with older versions of MkDocs. You basically just sprinkle
Markup()whenever you are producing a string that’s intended to be HTML, and that’s it.For themes theoretically there should be 0 changes necessary, but I’ll dig more.
@waylan I’m pretty sure the
<title>element also expects the value to be escaped. It’s not even that modern browsers unescape the title, the title is unescaped at the level of initially reading the HTML everywhere in the same way.But so under the current way MkDocs works, indeed it seems that unescaping the title string is untenable - it’s bad both for the
<title>and for normal usage within HTML. In this way, the title is only good for use as a normal string for some other processing.So, with
page.title = "*Hello — beautiful wor<dl>"and this theme HTML:right now this blows up.
I was very surprised to learn that MkDocs doesn’t enable Jinja autoescaping, which is the way most people use it these days. Currently all interpolated
{{ values }}need to be escaped explicitly, and that is why this blows up. People should’ve been escaping such values all along everywhere (by writing{{ page.title |e }}) if MkDocs doesn’t enable autoescaping.However if we were to enable autoescaping, then this would work perfectly, as the title would be escaped back.
Which sounds like a huuuge breaking change, but it might just work.
str('<hr>')gets interpolated as<hr>Markupmakes no differenceMarkup('<hr>')gets interpolated as<hr>str('<hr>')gets interpolated as<hr>Markup('<hr>')gets interpolated as<hr>So I think we have only these 2 options:
Enable autoescaping in jinja and put this into title:
*Hello — beautiful wor<dl>I implement this in commit https://github.com/oprypin/mkdocs/commit/516b072fb5bccc6618abd9ec8b6e13b7e30bf364 on top of your commit https://github.com/waylan/mkdocs/commit/305a5aa8456fa90dd5712015535584c263eba849 Notice that I now store the page content as aMarkupinstance - this is like pre-applying the jinja|safefilter - to exempt it from escaping because it’s actual HTML, unlike everything else.Keep the status quo and put this into title:
*Hello — beautiful wor<dl>Tags will still need to be stripped, so striptags will need to be reimplemented as you’re saying. (Or just callescapeon it again?) The string is now ugly and not usable in non-HTML contextsMy commit 516b072fb5bccc6618abd9ec8b6e13b7e30bf364 also implements another idea: it extracts text from the
altattribute of a tag if available. This will actually make emojis work whether it’s configured to use the png or the svg method.Thanks for the productive discussion @waylan , I got these ideas from your thoughts.
Ideally a title without Markdown formatting and HTML tags. If that’s not possible, MkDocs 1.4 behavior.
First off let’s make a reproduction case and clarify the behavior of markup in nav headers.
mkdocs.yml:
docs/index.md:
There is no change in the content itself but there is change in the nav rendering. Each theme behaves consistently with each other.
Markdown is ignored/preserved,
raw HTML is applied
Markdown is applied and stripped but
stashed Markdown is applied, raw HTML is applied
“mkdocs” theme
“mkdocs” theme
“readthedocs” theme
“readthedocs” theme
“material” theme - 9.1.20
“material” theme - 9.1.20
All in all both behaviors are wildly inconsistent. If I had to pick one, probably the one in MkDocs 1.5 makes more sense, but that doesn’t help much.
Emojis were producing poor results both before and after, just differently. Only “material” theme had somewhat proper styling for the emoji.
And there is the repro case from the issue on mkdocs-material, indeed the behavior is somehow different. But I haven’t looked why exactly the theme behaves like that - at first glance seems theme-specific.
“material” theme - 9.1.20-insiders-4.37.0
There’s also the separate matter that there is a “material/typeset” plugin (not public) that was relying on the behavior that Markdown is being ignored/preserved as it’s being moved to the nav title, and then re-applying that Markdown in its own way. So emojis were fully supported only on mkdocs-material-insiders in its own way, and this change made that not function anymore.
And it is definitely a huge issue that the HTML now reaches directly into search results
I’ve investigated and think I know what the problem is:
https://github.com/mkdocs/mkdocs/blob/f94ab3f62d0416d484d81a0c695c8ca86ab3b975/mkdocs/structure/pages.py#L439-L448
itertext()<svg>elementA brute force approach would be to strip all tags again, but maybe there’s a better solution.