vswhere: vswhere.exe uses local code page to emit invalid JSON/XML
When I run vswhere -products * -legacy -format json
under the Japanese mode/edition/version of Windows 10 Pro, I got one of the line:
"description": "学生、オープン ソース、および個々の開発者のための無料で完全な機能を備えた IDE",
The message above is correct, but encoded by code-page 932 (The default codec for Japanese mode).
Today, As described in RFC 8259, at the section “8.1. Character Encoding”, JSON files MUST use UTF-8 (and must NOT use byte-order-mark). Please use UTF-8, to make the valid JSON even when it includes non-ASCII string like above. Otherwise, valid JSON decoders claim the JSON file as invalid, especially they process the file as including bad Javascript \
escapes.
Almost same thing about vswhere -products * -legacy -format xml
. vswhere.exe uses local code page (cp932, under my environment) without encoding declaration at the beginning of xml file. To simplify, just hard-code to use UTF-8.
On the other hand, default format mode (-format text
or not using -format
) should use local code page, I think. Otherwise it shows unreadable strings(mojibake) in the window of cmd.exe
.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 17 (10 by maintainers)
Commits related to this issue
- Add -utf8 option to force UTF8 encoding Attempt to fix #146. The console host and shell's output encoding still play a major factor, however. In cmd.exe, you still need to set chcp to display strings... — committed to microsoft/vswhere by heaths 6 years ago
- Add -utf8 option to force UTF8 encoding Attempt to fix #146. The console host and shell's output encoding still play a major factor, however. In cmd.exe, you still need to set chcp to display strings... — committed to microsoft/vswhere by heaths 6 years ago
Option # 3 is the safest bet, but I also think that writing UTF-8 unconditionally when in JSON mode is also a safe bet. I have high doubts that many tools out there that are consuming the JSON data were (or are) able to handle non-UTF-8 JSON.