typst: Emoji do not render in PDFs
Hi,
I’ve just cloned the repo and compiled it. After compiling that document, I’ve a blank page (using Evince) or some tofu (using Firefox):
#emoji.face.grin
The generated PDF is 10MB big, with the Noto Color Emoji font embedded. When changing the zoom level, Evince writes some font thing failed
to stderr.
System: Ubuntu 22.04, cargo 1.68.0, typst 045a1096
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 40
- Comments: 18 (10 by maintainers)
As discussed on Discord, here is some background on color fonts and Typst’s PDF font handling and then the steps required to fix this issue.
A bit of background on color fonts
OpenType supports multiple formats for encoding emoji fonts. The data for each of these is stored in OpenType tables within the font. We can query this data with ttf-parser. The following color formats exist:
sbix: A table that encodes emojis as raster images. Backed by Apple. (Example font: Apple Color Emoji)
CBDT: Another table that encodes emojis as raster images, but in a slightly different way. Backed by Google. (Example font: Noto Color Emoji)
SVG: Encodes emojis as a subset of SVG. Backed by Adobe and Mozilla. (Example font: Twitter Color Emoji)
COLRv0: Encodes emojis with the normal font outlines + color palettes. Backed by Microsoft. (Example font: Segoe UI Emoji)
COLRv1: Microsoft noticed that color emojis with just plain colors look a bit boring, so they added a ton of SVG-like features to the COLR table. This format is quite recent and support for it isn’t merged into ttf-parser yet. Even with the latest updates, Windows doesn’t seem to ship it, so we can skip it for now. (Example font: Recent versions of Noto Color Emoji)
The inner workings of these OpenType tables is mostly abstracted away by ttf-parser, but it’s still important to know how they work conceptually.
How Typst writes text and fonts into PDFs
Within
write_text
intypst-pdf/src/page.rs
, text is written by writing CIDs (character ids) into the content stream with the/TJ
operator. In spite of their name CIDs are not like Rust chars. Instead, they typically map 1-1 to glyphs IDs in a font because we configure anIdentity
CID-to-GID mapping (except in the case of CFF fonts, which work a bit differently).The CIDs reference a font configured via the
/Tf
operator. While writing the text items, we collect all fonts that are referenced, which we then embed into the PDF at the end. This happens intypst-pdf/src/font.rs
To be able to copy from the PDF, a PDF viewer must map the CIDs back to Unicode text. This is what the/ToUnicode
mapping is for, which we write for each font.So, this is how it works for normal fonts. The problem now is that PDF viewers completely ignore the color tables in emoji fonts and fall back to normal outlines (if available). To get emojis to show up, we have two different options:
Encode them as graphics rather than text. Then, they aren’t copy-pastable. While we can, in theory, specify an
/ActualText
that should be copied, many PDF viewers don’t seem to support that.Encode them as Type 3 fonts. A Type 3 font is a special type of PDF font that doesn’t embed font data in an external format like TTF or CFF, but rather defines the font’s glyphs directly as PDF objects. This way, we can create the emojis as PDF graphics, but display them with the normal text-showing operators.
Based on the conversation above and what other tools do, Type 3 seem like the better approach. Relevant details can be found in the PDF 1.7 specification section 9.5.6. “Type 3 fonts”.
Implementing it in Typst
Here’s a rough outline of the steps involved in implementing emoji handling for PDF in Typst:
When writing a glyph in a text run, we need to detect whether an emoji glyph definition exists in any of the formats above. If yes, we need to terminate the text run and switch to a Type 3 font we will generate for it. This should live in
typst-pdf/src/page.rs
, likely using some helpers defined intypst-pdf/src/font.rs
.We need code to convert emoji definitions in any of the formats above into PDF content streams. The PNG exporter has existing code for all the formats except COLR (since it’s a recent addition to ttf-parser) whereas the SVG exporter doesn’t handle them yet. To share as much code as possible between the exporters, it would probably make sense to convert a color glyph to a Typst
Frame
rather than directly producing PDF content for it and then reuse this frame across all three exporters. This code could live intypst/src/text/font/color.rs
.An unfortunate limitation of Type 3 fonts is that they can encode at most 256 glyphs, so if more than 256 emojis from the same font are used, we need to write multiple Type 3 fonts for that one.
We need to actually write the necessary variable number of Type 3 fonts for each font and generate
/ToUnicode
mappings for them. This should live intypst-pdf/src/font.rs
.Damn, all I wanted was a little 🐿️, turns out it’s Specs War Infinity Edition Thanks for planning on supporting this 😀
As a workaround, I wrote a package svg-emoji to replace emoji with an SVG glyps directly. For now, it only offers Noto support.
@elegaanz thanks a lot for this crucial feature! Typst is now 1.0 for me 🐿️
The planned fix is to export them as XObjects with
/ActualText
to make them copyable. If that turns out to be problematic, another alternative would be to embed them as Type 3 fonts.Emoji fonts aren’t correctly exported at the moment. There may also be unrelated font issues at play here.
Hum, that’s quite interesting:
With the following result:
Here is the file test.pdf