runtime: Consider adding a JavaScriptEncoder implementation that doesn't encode the block list or surrogate pairs.
I would like to know if there is a way or any intention for the System.Text.Json serializer to support emojis defined with surrogate pairs?
When we serialize βπ²β we get its representation as escaped unicode values. I would like it to remain unescaped.
static void Main()
{
var encoderSettings = new TextEncoderSettings();
encoderSettings.AllowRange(UnicodeRanges.All);
var serializerOptions = new JsonSerializerOptions
{
Encoder = JavaScriptEncoder.Create(encoderSettings),
WriteIndented = true
};
var json = JsonSerializer.Serialize("π²", serializerOptions);
Console.WriteLine(json);
// "\uD83D\uDCF2"
}
Thank you in advance.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 15 (7 by maintainers)
Commits related to this issue
- Add UncodeJsonEncoder (fixes #42847). Also fixes #86800. Also fixes #87138 (except docs outside this repo). — committed to davidmatson/runtime by davidmatson a year ago
- Add UncodeJsonEncoder (fixes #42847). Also fixes #86800. Also fixes #87138 (except docs outside this repo). — committed to davidmatson/runtime by davidmatson a year ago
I will look into a workaround. Thank you all for the responses. Will close this issue.
That sounds like a problem with your decoding app not correctly handling valid JSON.
The JSON specification says: β To escape an extended character that is not in the Basic Multilingual Plane, the character is represented as a twelve-character sequence, encoding the UTF-16 surrogate pair. So, for example, a string containing only the G clef character (U+1D11E) may be represented as β\uD834\uDD1Eβ. β
Technically only the quotation mark and reverse solidus require escaping (it is the only way to represent them), so maybe they could add switches to control if other character do or do not get encoded (tabs, line feeds, control characters, values outside ASCII, values outside BMP). e.g. the UTF-8 byte sequence will be shorted than the 12-byte escaped surrogate pair, so save a few bytes.
But the decoding app should still handle all valid encodings.
You are asking for a work around to compensate for a bug in your mobile app.
There is no built-in mechanism for allowing this. Youβd have to subclass the
JavaScriptEncoder
type and set theJsonSerializerOptions.Encoder
property to an instance of your custom type.@layomia - We could consider allowing the unsafe relaxed escaper to allow known-good supplementary characters. This isnβt trivial work, though.