wasm4: `textUtf8` doesn't parse as UTF-8

Comparing https://github.com/aduros/wasm4/blob/493eaec96b8c5e75d8031413e1a312690bff5733/runtimes/web/src/runtime.ts#L197-L200 and https://github.com/aduros/wasm4/blob/493eaec96b8c5e75d8031413e1a312690bff5733/runtimes/web/src/runtime.ts#L278-L282 It seems the TextDecoder was forgotten in the implementation of textUtf8.

This is also true for the native runtime, and for the UTF-16 methods.

EDIT:

There’s been a lot of discussion, and SLiV has done a lot of good work on this. With the Rust template fixed, the issue isn’t as big in practice anymore.

The best solution right now would probably be to clearly document textUtf8 and textUtf16 everywhere they are used that they are misnomers, but live with that fact. An alternative text8/text8len/text8slice (same for 16) could be added, but we would still need to keep the old versions for compatibility anyways, so it increases the API surface without providing functionality.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 18 (18 by maintainers)

Commits related to this issue

Most upvoted comments

Given that #591 solves the main functionality problem (allowing Rust developers to draw "\x80" without UB), and does not suffer the loss of ergonomics that I was afraid of (because text("Hello world!", 20, 20) still compiles), I no longer feel as strongly about parsing UTF-8. I still think it is more correct to reencode from a programming languages default string encoding to ASCII rather than generate mojibake, but the impact on developers is now much less.

For completeness, here is the output of my test cart for Rust after applying #591: wasm4-screenshot_2022-11-08_ascii_rs It is the same as before (v2.5.3), except text(b"Press \x80 to blink") now works as a safe well-behaved alternative to the str::from_utf8_unchecked hack. This does mean that text("£", 20, 20) does not work (that is, results in mojibake) despite £ being in the character set supported by WASM-4, which feels a bit unintuitive but not inexplicable.

The misnomers textUtf8 and textUtf16 do still annoy me, and in the long term I think renaming them to text8 and text16 (or what have you) would help a lot in clarifying that their behavior is very different from traceUtf8 and traceUtf16. Depending on where the consensus lands on this issue, I could isolate the (backwards compatible) rename I did in #528 from the functional changes.

I have no idea if the UTF-8 encoding would add considerable overhead to the cart size.

Yes! String.UTF8.decode + String.UTF8.encode adds + ~1 Kb which I think quite significant