libass: Investigate unexpected vsfilter behavior with \h on certain font
Sample script + font: GosmickSample.zip
Lines in question:
Dialogue: 0,0:16:58.62,0:17:04.12,Song Insert Romaji,,0,0,0,,{\fad(200,200)\k33}A{\k22}ko{\k34}ga{\k35}re {\k32}no {\k23}se{\k14}ri{\k20}fu {\k68}o {\k51}"\h\h\h" (kak{\k39}ko) {\k34}ni {\k37}i{\k17}re{\k20}te {\k17}mi{\k54}you
Dialogue: 0,0:16:58.62,0:17:04.12,Song Insert TL,,0,0,0,,{\fad(200,200)}Let's try putting aspirational words in the "\h\h\h"
Reported MPC/vsfilter output:
So, uh, it seems to be rendering \h (i.e. U+00A0/nbsp) with a, which is in the .notdef glyph slot? What?
The libass behavior here is sane (we render whitespace as expected), and I haven’t seen any indication that anybody has deliberately relied on this behavior, so this might not be worth changing, but I figure it’s at least worth understanding and documenting.
About this issue
- Original URL
- State: open
- Created 10 months ago
- Comments: 15 (14 by maintainers)
Discovery of the century incoming… You know what, I feel silly. This is so simple and stupid; how did we spend years not realizing this?
GDI doesn’t do inline font fallback. At all.
GDI uses font linking—and nothing more.
The Uniscribe-based code path does inline font fallback, and its fallback choice can differ from GDI’s font linking.
My proof:
Microsoft’s docs (which gave me the idea):
This mentions:
implying that without Uniscribe, font fallback doesn’t happen.
This mentions font linking together with GDI, whereas for font fallback, it mentions .NET and Uniscribe but not GDI.
It also mentions:
implying that without font linking, undefined glyphs will indeed be displayed as tofu, not fall back to other fonts.
(It does talk of how font linking “takes priority over font fallback”, but the way it is described in the next sentence makes me think that perhaps this is meant to say “font substitution”, which is GDI’s mechanism for defining font aliases, described further down the page, or perhaps this means something else completely.)
Targeted test (after reading those docs):
Arial has no font linking defined on my machine. Arial has a surprisingly wide glyph coverage, but it lacks a glyph for U+2025 TWO DOT LEADER in the General Punctuation block and it lacks Japanese glyphs. Arial has GPOS kerning, which makes it easy to tell when Uniscribe is activated by including the heavily-kerned string “WAT.” in the test. As this test reaffirms, General Punctuation isn’t treated as a “complex script”, but Japanese kana is.
Tahoma has a long list of linked fonts on my machine, the first of them being MS UI Gothic. Tahoma also has a surprisingly wide glyph coverage, and it does have a glyph for U+2025 TWO DOT LEADER, but it lacks a glyph for U+2196 NORTH WEST ARROW in the Arrows block. On the other hand, MS UI Gothic does have a glyph for that arrow. The Arrows block isn’t treated as a “complex script”. Tahoma doesn’t have kerning outside of Arabic.
This ASS:
(where the
\fsfor MS UI Gothic is calculated from Tahoma’s\fsand both fonts’ metrics in such a manner that the em size in pixels stays constant, as documented for font linking)displays:
As we can see, kerning is applied in the bottommost line (with the Japanese) but not the one above it (without the Japanese), so one is rendered by Uniscribe and the other by GDI itself. The two glyphs absent from Arial are shown in the Uniscribe rendering but replaced by tofu in the GDI rendering. In the Tahoma lines, all glyphs are visible but the arrow uses different glyphs in Uniscribe and in GDI. GDI’s glyph exactly matches the explicitly-requested MS UI Gothic glyph, so it must be that GDI is applying font linking whereas Uniscribe is applying font fallback that isn’t based on font linking.
Fallback works normally (for
\h, Cyrillic, hiragana, and code points in the Latin Extended-A block that don’t have dedicated glyphs in the font) in runs rendered by Uniscribe (e. g. if hiragana is included), but absolutely everything (\h, Cyrillic and Latin alike) is displayed as.notdefin runs rendered by GDI.