slate: html-serializer doesn't work with nested blocks
Do you want to request a feature or report a bug?
A bug
What’s the current behavior?
const BLOCK_TAGS = {
blockquote: 'quote',
p: 'paragraph',
//div: 'div'
}
const rules = [
{
deserialize(el, next) {
const type = BLOCK_TAGS[el.tagName.toLowerCase()]
if (!type) return
return {
kind: 'block',
type: type,
nodes: next(el.childNodes)
}
}
}
]
const pureHtml = '<blockquote><div>a text<blockquote>inner quote</blockquote></div></blockquote>'
const initialValue = new HtmlSerializer({ rules: rules }).deserialize(pureHtml);
It only renders a text
element, and I couldn’t see inner quote
.
See https://jsfiddle.net/oj53q1n2/26/
What’s the expected behavior?
We should see both text and quote.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 15 (11 by maintainers)
I’d be curious to hear more about why that validation rule exists at all. While I agree that there’s a conceptual correctness to it, there’s no such restriction in HTML. In addition to the example provided by @nghuuphuoc, the following is valid HTML that is “unrepresentable” in Slate.
The implicit behavior of silently destroying content feels like it needs a strong justification and prominent documentation.
After a bit more digging, the problem appears to be this constraint in the core Slate schema (which is enforced by Value.fromJSON inside the HTML deserializer):
If the input HTML contains
<div>
tags and the Serializerrules
convert thosediv
to blocks rather than ignoring them, it’s easy to create a structure that will be ripped apart by the schema validation after parsing, because Slate does not allow blocks to have both block children and text / inline children and this is a very common<div>
case.My solution is here: https://gist.github.com/bengotow/f5408e9cb543f22409d033df58e34579. Before running the HTML deserializer, I traverse the DOM tree and ensure that divs, blockquotes, and other nodes converted to Slate
block
s contain either text + inline children OR block children, wrapping children into blocks as necessary. Curious whether this would be welcomed as default behavior in some way (cc @ianstormtaylor).Hey @kornil! After a bit more polish, I actually ended up switching to an approach that adds wrapping blocks, etc. to the resulting Slate graph before passing it through the normalizer, rather than changing the HTML before converting it. I think that’s preferable because it works with any HTML <> Slate mapping rather than relying on an assumed set of conversions.
You can find the latest code I’m using here: https://github.com/Foundry376/Mailspring/blob/master/app/src/components/composer-editor/conversion.jsx#L172. I also wrote code to join adjacent text nodes rather then letting Slate do it during normalization, which sped things up a LOT because it’s a simple transform and Slate “assumes the worst” when it runs a normalization step (and spends time re-finding the nodes, etc.)
Hey folks, this is by design. Slate does not allow you to have mixed inline and block level content in the same node. A block can either contain all block nodes, or it can contain inline and text nodes. This is enforced in the core editor-level schema.
The reason for this is that it makes implementing editing behaviors much simpler. It allows you to avoid a whole class of issues and questions that crop up related to intermingling. I realize there are no restrictions on HTML, but that’s also what makes the native
contenteditable
behaviors so hard to standardize and predict.If someone wants to open a pull request with a specific improvement to the docs for this, I’d be happy to merge it. I’m going to close this otherwise, since it’s not something that is a bug that we can address.
Rough code here - https://gist.github.com/crisward/b61bd926d44c1e58d05f0c0c472262a4 There is a bit of sanitisation code mixed in with that method, I was using it when pulling in content from our older cms.