runtime: Consider adding a JavaScriptEncoder implementation that doesn't encode the block list or surrogate pairs.

I would like to know if there is a way or any intention for the System.Text.Json serializer to support emojis defined with surrogate pairs?

When we serialize β€œπŸ“²β€ we get its representation as escaped unicode values. I would like it to remain unescaped.

static void Main()
{
    var encoderSettings = new TextEncoderSettings();
    encoderSettings.AllowRange(UnicodeRanges.All);
    var serializerOptions = new JsonSerializerOptions
    {
        Encoder = JavaScriptEncoder.Create(encoderSettings),
        WriteIndented = true
    };
    var json = JsonSerializer.Serialize("πŸ“²", serializerOptions);
    Console.WriteLine(json);
    // "\uD83D\uDCF2"
}

Thank you in advance.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 15 (7 by maintainers)

Commits related to this issue

Most upvoted comments

I will look into a workaround. Thank you all for the responses. Will close this issue.

@terrajobst sure. Our mobile apps display the escaped text as is instead of the emojis.

That sounds like a problem with your decoding app not correctly handling valid JSON.

The JSON specification says: ’ To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as β€œ\uD834\uDD1E”. ’

Technically only the quotation mark and reverse solidus require escaping (it is the only way to represent them), so maybe they could add switches to control if other character do or do not get encoded (tabs, line feeds, control characters, values outside ASCII, values outside BMP). e.g. the UTF-8 byte sequence will be shorted than the 12-byte escaped surrogate pair, so save a few bytes.

But the decoding app should still handle all valid encodings.

You are asking for a work around to compensate for a bug in your mobile app.

There is no built-in mechanism for allowing this. You’d have to subclass the JavaScriptEncoder type and set the JsonSerializerOptions.Encoder property to an instance of your custom type.

@layomia - We could consider allowing the unsafe relaxed escaper to allow known-good supplementary characters. This isn’t trivial work, though.