gitea: Web interface does not render composed unicode characters correctly
Description
All composed UTF-8 characters, like s̄_b, ṡ_b, etc., are not rendered correctly in Gitea.
Screenshots
See how s̄_b is being rendered. It even shows that there are a hidden unicode characters in this line.
Gitea Version
1.16.8
Can you reproduce the bug on the Gitea demo site?
Yes
Operating System
macOS
Browser Version
Safari 15.4
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (13 by maintainers)
Commits related to this issue
- Switch Unicode Escaping to a VSCode-like system This PR rewrites the invisible unicode detection algorithm to more closely match that of the Monaco editor on the system. It provides a technique for d... — committed to zeripath/gitea by zeripath 2 years ago
- Switch Unicode Escaping to a VSCode-like system (#19990) This PR rewrites the invisible unicode detection algorithm to more closely match that of the Monaco editor on the system. It provides a tech... — committed to go-gitea/gitea by zeripath 2 years ago
- Switch Unicode Escaping to a VSCode-like system (#19990) This PR rewrites the invisible unicode detection algorithm to more closely match that of the Monaco editor on the system. It provides a tech... — committed to IntegraSDL/gitea by zeripath 2 years ago
I think composition should not be influenced by tag boundaries and other browsers seem to agree as well.
Github is wrong to not report that there is something odd. ë and ë are not the same characters.
The problems stems from Safari’s rendering. The code looks like:
The
<span class="escaped-code-point" data-escaped="[U+0304]"><span class="char">is all inline and zero width so the overbar (U+0304) should still be being rendered over thes. Safari is incorrectly rendering this.Now, how could we fix? Without having access to a Safari browser I’m not sure. Is there any way get Safari to just do the right thing with the spacing of the combining character here?
We do this splitting because it makes writing the escaping/unescaping extremely easy for combining marks:
https://github.com/go-gitea/gitea/blob/ac88f21ecc5612befe51f7ab6ffcb76c681daba5/modules/charset/escape.go#L166-L179
In order to not do it we’d have to coalesce bytes that can be combined together and emit an escape block for them together eg.
That coalescing would require us to write the escaper to understand the rules for rendering of combining characters and have the state for handling these.
For example if we had:
̄(that is [U020] + [U304]) Should that be coalesced as <space>+[U304] or kept as [U304]? How about multiple combining characters e.g.: ē̂ e +U304 + U302? How about when/if we get round to properly dealing with ambiguous characters like с and С (which are not the same as c and C)? If these have combining characters do we coalesce or escape the ambiguous character separately?If it is possible to get Safari just render the combining character in the right place that would be deeply helpful instead.