react-native-render-html: Whitespace handling differs from HTML significantly (no collapsing, newlines ignored)

Is this a bug report or a feature request?

Bug report. Though fixing this might change the behavior of this lib too much for users, so the solutions are probably to write documentation and/or provide an opt-in fix.

Have you read the guidelines regarding bug report?

Yes.

Have you read the documentation in its entirety?

Yes.

Have you made sure that your issue hasn’t already been reported/solved?

Yes, to the best of my abilities. 😃

Is the bug specific to iOS or Android? Or can it be reproduced on both platforms?

Both platforms.

Is the bug reproductible in a production environment (not a debug one)?

Sorry, have not tried yet in a production build. I expect the same result though.

Have you been able to reproduce the bug in the provided example?

Have not tried, but the issue has a really simple setup so it shouldn’t differ.

Environment

Environment:

  • React: 16.0.0
  • React native: 0.51.0
  • react-native-render-html: 3.9.0

Target Platform:

  • Android (7.0)
  • iOS (11.2)

Steps to Reproduce

Render this JSX:

<HTML html={'  <div>  foo\n\nbar  baz  </div>  <div>zzz</div>  '} />

Expected Behavior

I expected react-native-render-html to handle whitespace collapsing similarly to what HTML does. Replacing a rendered space character (U+0020 SPACE) with , that would be:

foo•bar•baz
zzz

Actual Behavior

react-native-render-html (3.9.0 and master) renders:

••foobar••baz
zzz

What seems to work:

  1. Removing spaces outside of block tags if they only contain whitespaces
  2. Removing whitespace at the end of a block tag’s content

What seems broken:

  1. Removing whitespace at the beginning of a block tag’s content
  2. Collapsing multiple spaces (and other whitespace characters) to a single rendered space, in the middle of text content
  3. Collapsing newlines to a single space character, in the middle of text content (newlines seem to be removed altogether)

I suspect this lib is limited by React Native’s Text component and errs on the side of not manipulating text too much, only removing newlines?

If that is the case, I can think of two possible improvements:

  1. Document this behavior and how HTML strings containing a lot of whitespace (which is common with some sources or JS editors) can show extra spaces before or between words.
  2. If it makes sense, maybe provide an option for performing more HTML collapsing?

I’m going to implement some fixes on our side using simple regexps on our HTML. I can post what I come up with here if that’s useful. Or maybe I should do it in alterData for more fine-grained control?

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Reactions: 6
  • Comments: 47 (2 by maintainers)

Commits related to this issue

Most upvoted comments

Based on @djpetenice genius idea, I created this function:

export const fixHtmlBlockSpaces = (str = '') => {
    const fixedStr = str.replace(/(<\/[^>]+>)\s+(<)/gm, (substring, group1, group2) => {
        return `${group1}<span style="color: transparent">_</span>${group2}`;
    });
    return fixedStr;
};

Now this:

<p><a href="abc">abc</a> <a href="def">def</a></p>

Is replaced by this:

<p><a href="abc">abc</a><span style="color: transparent">_</span><a href="def">def</a></p>

I’ve made great progress with the new release! Given this snippet:

<span>This is text!</span>





This is <strong>bold</strong> <em>italics</em>.

We now have:

whiteSpace: pre; whiteSpace: normal;

Screenshot_1605564643

Screenshot_1605564547

You will be able to control the whitespace behavior with the special whiteSpace style property in any of the places you could previously customize styles (baseFontStyles, tagsStyles …etc).

I am currently developing this behavior as part of a service for Expensify. The new engine following the whitespace RFC is being implemented here: https://github.com/native-html/core. An early release should be available in the upcoming week.

This pre-release will be part of the 6.x release cycle. If you are wondering why we’re jumping from 4 to 6, the reason is that 6.x will require more recent versions of React Native, and we want all users to benefit from the 5.x enhancements already available in alpha. Also, the new engine changes the structure of nodes available with onParsed, and the renderers prop will probably look different.

I’ve done some research to see where those collapsing rules are specified. It appears to be the CSS rule white-space. The complete reference algorithm is defined in the CSS Text Module Level 3, sections 3 and 4.

Full Reference

<span class="secno">4.1.1. </span><span class="content">Phase I: Collapsing and Transformation</span>

For each inline (including anonymous inlines; see [CSS2] section 9.2.2.1) within an inline formatting context, white space characters are processed as follows prior to line breaking and bidi reordering, ignoring <dfn data-dfn-type="dfn" data-noexport id="bidi-formatting-characters">bidi formatting characters</dfn> (characters with the Bidi_Control property <a data-link-type="biblio" as if they were not there:

  • If white-space is set to normal, nowrap, or pre-line, white space characters are considered <dfn class="dfn-paneled" data-dfn-type="dfn" data-export data-lt="collapsible white space|collapsible" id="collapsible-white-space">collapsible</dfn> and are processed by performing the following steps:

    1. Any sequence of collapsible spaces and tabs immediately preceding or following a segment break is removed.
    2. Collapsible segment breaks are transformed for rendering according to the segment break transformation rules.
    3. Every collapsible tab is converted to a collapsible space (U+0020).
    4. Any collapsible space immediately following another <span id="ref-for-collapsible-white-space③">collapsible</span> <span id="ref-for-spaces⑤">space</span>—even one outside the boundary of the inline containing that <span id="ref-for-spaces⑥">space</span>, provided both <span id="ref-for-spaces⑦">spaces</span> are within the same inline formatting context—is collapsed to have zero advance width. (It is invisible, but retains its soft wrap opportunity, if any.)
  • If white-space is set to pre, pre-wrap, or break-spaces, any sequence of spaces is treated as a sequence of non-breaking spaces. However, for <span class="css" id="ref-for-valdef-white-space-pre-wrap③">pre-wrap</span>, a soft wrap opportunity exists at the end of a sequence of spaces and/or tabs, while for <span class="css" id="ref-for-valdef-white-space-break-spaces②">break-spaces</span>, a <span id="ref-for-soft-wrap-opportunity④">soft wrap opportunity</span> exists after every <span id="ref-for-spaces⑨">space</span> and every <span id="ref-for-tabs④">tab</span>.

<span class="secno">4.1.2. </span><span class="content">Phase II: Trimming and Positioning</span>

Then, the entire block is rendered. Inlines are laid out, taking bidi reordering into account, and wrapping as specified by the white-space property. As each line is laid out,

  1. A sequence of collapsible spaces at the beginning of a line is removed.
  2. If the tab size is zero, preserved tabs are not rendered. Otherwise, each <span id="ref-for-preserved-white-space⑥">preserved</span> <span id="ref-for-tabs⑥">tab</span> is rendered as a horizontal shift that lines up the start edge of the next glyph with the next tab stop. If this distance is less than 0.5ch, then the subsequent <span id="ref-for-tab-stop①">tab stop</span> is used instead. <dfn class="dfn-paneled" data-dfn-type="dfn" data-lt="tab stop" data-noexport id="tab-stop">Tab stops</dfn> occur at points that are multiples of the <span id="ref-for-tab-size-dfn①">tab size</span> from the block’s starting content edge. The <span id="ref-for-tab-size-dfn②">tab size</span> is given by the tab-size property.

    <span>Note:</span> See [UAX9] for rules on how U+0009 tabulation interacts with bidi.

  3. A sequence at the end of a line of collapsible spaces is removed, and any trailing U+1680 OGHAM SPACE MARK is also removed if it’s white-space property is normal, nowrap, or pre-line.

    In the case of bidirectional text, any sequence of collapsible spaces located at the end of the line prior to bidi reordering [CSS-WRITING-MODES-3] is also removed, and bidi reordering is applied on the remaining content of the line.

  4. If there remains any sequence of white space, and/or other space separators, at the end of a line (after bidi reordering [CSS-WRITING-MODES-3]):
Glossary
inline An inline is an element inside of an inline formatting context.
inline formatting context https://www.w3.org/TR/CSS2/visuren.html#inline-formatting
segment break https://www.w3.org/TR/2020/WD-css-text-3-20200429/#segment-break
inter-element whitespace https://html.spec.whatwg.org/multipage//dom.html#inter-element-whitespace

But this reference considers multiple contexts that we can ignore in a minimal compatibility approach. The reference describes the required behavior for multiple values of the white-space CSS property, normal, pre-line, nowrap… We can keep focus on the normal value, since this is the default behavior reported by @fvsch. Also, bidirectional layouts for RTL can be considered later, because they add complexity and are limited by React Native own support of these features. Moreover, there seems to be other kind of subtleties depending on localization. Here are the highlights of the spec I have identified. :

  • The rules are context-dependent: in inline formatting context for example, inter-element whitespaces must be collapsed, while they should be removed in block formatting contexts. This property makes a simple regex-based approach inefficient.
  • A simplified run of the spec where assumed that the white-space rule is white-space: normal;:
    1. Any sequence of collapsible spaces and tabs immediately preceding or following a segment break (that is, any character causing a line break) is removed.
    2. Segment break transformation rules are applied: any collapsible segment break immediately following another collapsible segment break is removed. Then any remaining segment break is either transformed into a space or removed depending on the context before and after the break:
    • If the character immediately before or immediately after the segment break is the zero-width space character, then the break is removed, leaving behind the zero-width space.
    • Otherwise, if both the character before and after the segment break belong to the space discarding character set, then the segment break is removed.
    • Otherwise, the segment break is converted to a space.
    1. Every collapsible tab is converted to a space.
    2. Any collapsible space immediately following another collapsible space—even one outside the boundary of the inline containing that space, provided both spaces are within the same inline formatting context—is collapsed.

The W3C consortium also provides a gigantic test suite, and one folder is specifically dedicated to CSS whitespaces which can be a source for inspiration. In the meantime, I have started to implement some basic tests regarding whitespaces, see 53b8679ddc74badb486348a7404fc835527cb7f4 and d76f99d9d44b3bb38fe92a7403b1165e6b10e765. A majority of them fail, of course, which is the point of this issue!

2020-07-14-183415_352x100_scrot

I did a hack by adding a span with a single character and styling it the same colour as my background.

This issue has been fixed in the Foundry release. Try it out now! See #430 for instructions.

One possible workaround I’ve found, although I can’t vouch for it as I haven’t tested it fully, is tricking the library into thinking that the text node is not fully whitespace. This seems to prevent the node from being collapsed. Here I add a zero-width character to whitespace data:

  alterData: node => {
    if (node.data.match(/\s+/)) {
      return `${data}\u200C`
    }
  }

In the cases I’ve seen, this seems to stop the whitespace from being lost. However I’m not sure if there’s other impact.

@djpetenice same problem here.

If the content is (with space between a):

<p><a href="abc">abc</a> <a href="def">def</a></p>

It renders (without space):

abcdef

Ok!

I’ve used the space after the tag closure because of this scenario: if I have <strong>hello</strong><em>world</em> I want to show helloworld But if I have <strong>hello</strong> <em>world</em> I want to show hello world

So I don’t want to have a space between strings if the 2 tags are attached. The one you’ve written (html = html.replace(/<[/]strong>/g, " ")😉 will remove the strong tag closure and replace it with a space.

@Draccan yup, these tags use the same logic as the ones fvsch pointed out.

@fvsch your solution in your gist looks pretty clean. It feels like a solid improvement, even if it’s obviously not matching an actual browser rendering. I’ll try it out more, and figure out whether if this can be added in the codebase by default. I don’t feel like documenting your gist and asking people to copy and paste a hundred lines into their own project just to add this feature, and I don’t want to keep bloating the module with additional props either.

I’ll measure the performances impact and the potential regressions (all help is welcome here !) and if everything is going smoothly, let’s add it to the project. What do you think ?

@Exilz The current whitespace handling has bugs too:

return <HTML html="<p>This <span>is</span> <strong>buggy</strong></p>" />

Shows:

This•isbuggy

You probably have a rule that a whitespace-only text node between two tag siblings can be dropped, but if the tags are both inline it should be kept (and collapsed to a single space if needed).

If you can point me to the right source files, I can have a look and maybe do a PR. 😃

Applying the following replace function to data before passing it to the component works for all our content so far:

.replace(/[\t\r\n ]+/g, ' ')

Are there any plans to resolve this issue any time soon? Or are workarounds the recommended solution at the moment?

Thanks

Not sure. that is working for white space issue

I made a function to fix the white space between tags.

function fixSpaceInHTML(html){
    let tagWithSpaceMatch = /<([^>]+)>\s+<([^>]+)>/.exec(html);
    if(tagWithSpaceMatch){
        let tagWithSpace = tagWithSpaceMatch[0];
        let newTagWithSpace = tagWithSpace.replace(/\s/g, '') + " ";
        let newHtml = html.replace(tagWithSpace, newTagWithSpace);
        console.log({newHtml});
        return fixSpaceInHTML(newHtml);
    }else{
        console.log({html});
        return html;
    }
}

That is working fine. I hope it is helpful for anyone.

Anyone having issues with spaces between adjacent links? I’ve tried all spacing methods and all are being ignored:

links

For the problem where react-native-render-html suppresses newline characters (instead of rendering them as a space), did you try replacing your newline characters first?

<HTML html={myHtmlContent.replace(/\r?\n/g, ' ')} />

life is strange … 😃

now works!

the problem was thew regex:

is: html = html.replace(/<[/]strong>/g, " ");

and NOT html = html.replace(/<[/]strong> /g, " ");

without **space/g work fine…

I think that now the best solution is to understand what kind of HTML you receive.

We are lucky because in the app we are developing the admin panel gives the possibility to enter text with bold, italic, paragraphs, etc.

So the HTML editor is an our component and we know how it puts HTML tags. If you’re lucky like us you can use something like:

` code:

addSpaceToHtml(htmlInput){

	if(htmlInput !== null && htmlInput !== undefined){

		let html = htmlInput;

		html = html.replace(/<[/]strong> /g, " </strong>");
		html = html.replace(/<[/]em> /g, " </em>");
		html = html.replace(/<[/]i> /g, " </i>");
		html = html.replace(/<[/]s> /g, " </s>");
		html = html.replace(/<[/]u> /g, " </u>");
		html = html.replace(/<[/]span> /g, " </span>");

		return html;
	}
}

`

Our editor puts spaces after tag closure and we put them before the tag closure in our app.

If you’re taking html pages from web I don’t recommend to use this approach and especially avoid regex!

I have seen that the space is removed not only for strong tag but even for em, s and u. For example: <p><strong>A</strong> <em>B</em> <u>C</u> <s>D</s></p> It prints ABCD without spaces 😦

These parts in HTML.js are probably wrong:

if (type === 'text') {
    if (!strippedData || !strippedData.length) {
        // This is blank, don't render an useless additional component
        return false;
    }
    // Text without tags, these can be mapped to the Text wrapper
    return {
        wrapper: 'Text',
        data: data.replace(/(\r\n|\n|\r)/gm, ''), // remove linebreaks
        attribs: attribs || {},
        parent,
        parentTag: parent && parent.name,
        tagName: name || 'rawtext'
    };
}
  1. Whitespace-only text nodes should be dropped if between block elements, or at the start or end of a block element, but not in the middle. HTML has even more advanced handling, and some of it can be mimicked, e.g. you can check parents to see if the whitespace-only text node is the first or last child of an inline element that is itself the first or last child of a block element (which I’m doing in my solution for dropping initial or trailing whitespace).

Use case:

<!-- We should render a single space between the links -->
<p>
  <a href="…">foo</foo>
  <a href="…">bar</foo>
</p>
  1. Line breaks should not be removed, but replaced with spaces (collapsing line breaks, tabs and ordinary spaces to a single space).

Use case:

<!-- We should render a single space between the words -->
<p>foo
bar</p>

This is the result I’m getting for a HTML string with lots of whitespace, first with no special processing and then with a custom alterData function that tries to mimic the HTML algorithm for whitespace collapsing:

screen shot

The result is not as complete as what HTML does, for instance in the case of <p><a>foo </a> , bar</p>, web browsers will remove the space before the comma, but I’m not sure what the algorithm is and I’m not going that far.

The render part for this test:

const testContent = `
a

e

<div>  foo\n\nbar  baz  </div>  <div>zzz</div>

<div>
  <div>
    <div>
      <blockquote>
        Salut<a href="#"> les copains </a> ,
        <span>comment ça va ?</span>
      </blockquote>
    </div>
  </div>
</div>

`

return (
  <View>
    <View style={borderStyle}>
      <HTML html={testContent} />
    </View>
    <View style={borderStyle}>
      <HTML html={testContent} alterData={collapseHtmlText} />
    </View>
  </View>
)

Where collapseHtmlText is a separate module (around 90 lines). I’m still working on it so I’ll test with more content before sharing.