wasm4: `textUtf8` doesn't parse as UTF-8
Comparing
https://github.com/aduros/wasm4/blob/493eaec96b8c5e75d8031413e1a312690bff5733/runtimes/web/src/runtime.ts#L197-L200
and
https://github.com/aduros/wasm4/blob/493eaec96b8c5e75d8031413e1a312690bff5733/runtimes/web/src/runtime.ts#L278-L282
It seems the TextDecoder
was forgotten in the implementation of textUtf8
.
This is also true for the native runtime, and for the UTF-16 methods.
EDIT:
There’s been a lot of discussion, and SLiV has done a lot of good work on this. With the Rust template fixed, the issue isn’t as big in practice anymore.
The best solution right now would probably be to clearly document textUtf8
and textUtf16
everywhere they are used that they are misnomers, but live with that fact. An alternative text8
/text8len
/text8slice
(same for 16) could be added, but we would still need to keep the old versions for compatibility anyways, so it increases the API surface without providing functionality.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 18 (18 by maintainers)
Commits related to this issue
- docs: Clarify textUtf* misnomers (solves #509) — committed to JerwuQu/wasm4 by JerwuQu 2 months ago
- docs: Clarify textUtf* misnomers (solves #509) — committed to JerwuQu/wasm4 by JerwuQu 2 months ago
- docs: Clarify textUtf* misnomers (solves #509) — committed to JerwuQu/wasm4 by JerwuQu 2 months ago
Given that #591 solves the main functionality problem (allowing Rust developers to draw
"\x80"
without UB), and does not suffer the loss of ergonomics that I was afraid of (becausetext("Hello world!", 20, 20)
still compiles), I no longer feel as strongly about parsing UTF-8. I still think it is more correct to reencode from a programming languages default string encoding to ASCII rather than generate mojibake, but the impact on developers is now much less.For completeness, here is the output of my test cart for Rust after applying #591: It is the same as before (v2.5.3), except
text(b"Press \x80 to blink")
now works as a safe well-behaved alternative to thestr::from_utf8_unchecked
hack. This does mean thattext("£", 20, 20)
does not work (that is, results in mojibake) despite £ being in the character set supported by WASM-4, which feels a bit unintuitive but not inexplicable.The misnomers
textUtf8
andtextUtf16
do still annoy me, and in the long term I think renaming them totext8
andtext16
(or what have you) would help a lot in clarifying that their behavior is very different fromtraceUtf8
andtraceUtf16
. Depending on where the consensus lands on this issue, I could isolate the (backwards compatible) rename I did in #528 from the functional changes.Yes! String.UTF8.decode + String.UTF8.encode adds +
~1 Kb
which I think quite significant