json: Buggy support for binary string data
I wanted to check whether I could use this library to store binary data w/o prior encoding to base64. Therefore I wrote the following simple function
static void TestBinDump()
{
const char* data = "A\x13""B\x8B\0C";
std::string s(data, data + 6);
json js;
js["key"] = s;
auto dump = js.dump(4);
std::cout << s.size() << " " << dump << std::endl;
auto jd = json::parse(dump.begin(), dump.end());
std::string v = jd.at("key");
std::cout << v.size();
}
This correctly outputs 6 { "key": "A\u0013Bï\u0000C" }
. The first surprise is that byte data has been encoded to 16-bit Unicode, so I wanted to see whether “8-bit” Unicode characters would get translated back to 8-bit values. I didn’t get so far because parsing threw invalid_argument
exception with message "parse error - unexpected '\"'"
.
I’m using version 2.1.1.
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 19 (12 by maintainers)
As you quote, a JSON string is a sequence of “Unicode code points” – but the technical term “code point” has NOTHING to do with bits and bytes. Bits and bytes happen once you start talking about encodings, like UTF-8 or UTF-16.
For example, if you have the “Unicode code point U+0058” (the letter X), then we still have no idea what that looks like in memory. It depends on what encoding you attach to your in-memory string. It could look like “58” (UTF8) or “0058” or “5800” (UTF16-LE, -BE), etc.
JSON cannot encode arbitrary binary, just sequences of code points – which, again, has nothing to do with binary representations 😃
The nlohmann library has chosen that strings be encoded in UTF8 (see the README); he could have chosen any of the Unicode encoding format (or support them all!), but he had to pick one of them, and UTF8 is a great one (see http://utf8everywhere.org/ for a great read on Unicode and encodings; it’s a fascinating topic).
Therefore, your input is not valid UTF8, and therefore should have been rejected by the encoder.
And, in case you still don’t believe me, do a quick search for “JSON encoding binary” and you’ll see lots of people trying to figure out good ways to do this – which should be a hint that it’s not natively supported by JSON. (For example, this popular StackOverflow question: http://stackoverflow.com/questions/1443158/binary-data-in-json-string-something-better-than-base64)