terminal: RTL text in conhost is no longer rendered correctly
Windows Terminal version
Commit eb7559733d3cc9062c7de610f3b95d9143099ca1
Windows build number
10.0.19041.1415
Other Software
No response
Steps to reproduce
- Build a recent version of OpenConsole.
- Open a conhost bash shell.
- Execute the following command:
printf "\u05ea\u05d7\u05d0\n"
Expected Behavior
RTL characters should be displayed in the exact order they were output, and not reversed. This is what it looks like in my inbox conhost (10.0.19041.1415):
This also matches the behaviour of XTerm.
Actual Behavior
In the current version of OpenConsole (I think since PR #10478), RTL characters are reversed, like this:
I realise that some people might consider this a good thing, since it gives the superficial appearance that it’s rendering RTL languages correctly, but it is not compatible with the original conhost and breaks genuine RTL-aware applications (which rely on characters being displayed exactly where they’ve been positioned).
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 1
- Comments: 24 (15 by maintainers)
Commits related to this issue
- Force LTR / logical order for text in GdiEngine (#12722) Some applications like `vim -H` implement their own BiDi reordering. Previously we used `PolyTextOutW` which supported such arrangements, bu... — committed to microsoft/terminal by lhecker 2 years ago
- Force LTR / logical order for text in GdiEngine (#12722) Some applications like `vim -H` implement their own BiDi reordering. Previously we used `PolyTextOutW` which supported such arrangements, but ... — committed to microsoft/terminal by lhecker 2 years ago
@j4james Damn that was almost too easy - took like 5 minutes: https://github.com/microsoft/terminal/compare/dev/lhecker/12294-bidi-override
I think typing this message took longer than writing that code. 😄
ScriptStringAnalyse is a handy function that calls ScriptItemize, ScriptShape, ScriptPlace, and ScriptBreak for you. Due to the lack of batching this approach is a lot slower than
ExtTextOut
though. My plan is to call those 4 functions myself (well 3, because we don’t really need ScriptBreak) and call ScriptIsComplex. If it’s false I can just straight up callTextExtOut
to ensure the expected performance in the general case.If I can’t make it for whatever reason though, I think this is what we could ship, since it works.
As far as I can discern, no one ever actually concerned themselves with Arabic nor Hebrew support in the console host. The targeted languages were basically LTR European type character sets + the CJK trio. Beyond that… it looks like anything else that worked or didn’t was a happy accident.
Furthermore, when our localization team tells us what languages we can pay for in terms of translations for developer utilities today… they limit it to: German, English, Spanish, French, Italian, Japanese, Korean, Brazilian Portuguese, Russian, Simplified Chinese, and Traditional Chinese. I’d therefore have to believe to some degree that research was performed to determine that was the appropriate balance between resources and developer market was to focus on those languages.
Therefore, my consideration here is happiness of those languages as primary goal with anything else being secondary.
Further, one of the most popular issues filed against
conhost.exe
is the lack of font fallback for Chinese, Japanese, and Korean languages. Switching toExtTextOut
(Option 1) to restore font fallback, therefore, dramatically reduced our inbound bug flow and solved an issue for four of the targeted languages. An issue I’ve never seen filed in Feedback Hub, directly from our OEM customers, our business partners, or otherwise in the last 7-8 years of working on this is anything about Hebrew or Arabic. I know that’s super scientific… to rely on my past experience.But with the combination of those reasons, I would have to personally opt for Option 1.
I would offer to @lhecker, if he’s interested, that next week is our organization’s “Fix Hack Learn” week again. If he wants to spend a few days hacking Option 3 using Uniscribe to solve this problem and learn more about language processing… he would be free to do so. I think it would be better, though, long-term to focus efforts on supporting those languages fully in the Terminal and the Atlas renderer.
The discussion can continue, I’m not shutting it down. This is just my opinion on the situation.
That’s kinda my point - do you already have dozens of reports like “things are broken, the world is falling apart, do something now”? Support for anything non-ASCII in Windows Console has always been like “it depends” - on the OS version, current font, system locale, console codepage, output method, the phase of the Moon etc. Personally I haven’t seen any applications even trying to cover non-trivial cases, but YMMV of course.
Yeah, I saw the ligatures were working. I meant the other things you were refering to when you said “ZWJ, etc. and all the other fun Unicode stuff”.
And while it looks nice at first glance, it’s not particularly useful as is as far I’m concerned. You’ve got no hope of editing the text - all it’s really good for is displaying a single line of content at best.
Yeah, ideally we’d have a solution that was realistically usable and also looked pretty, but I don’t know how feasible that is for languages with ligatures. I thought with something like Arabic, an application might be able to output the appropriate form of each character manually, which might make up for the loss of ligature support, but I don’t know enough about the subject to know if that’s nonsense.
I wouldn’t have said “most”, but I haven’t checked recently. And for those that do draw the glyphs in RTL order, there’s not a standard of any sort that they’re following - they all do things differently. Thankfully some of them at least have a way to turn that functionality off.
You realise that just means we’re saying we don’t support TUI applications fullstop (at least for BiDi languages). If that’s the route we want to take, that’s fine - I seem to be in the minority in wanting support for RTL TUI apps. I’d just like to know for definite where we’re going with this, so I can make my own plans accordingly.
Well there is the command line utility from fribidi, which can be used as a kind bidi-aware version of
cat
(amongst other things). And there’s also a Hebrew mode in vim (i.e.vim -H
). The other applications I have are unfortunately not open source.Yeah, you’re right. I’ve just tested and that’s not working for me either. Oh well.
Yeah, that’s weird. It’s definitely not reordering RTL characters for me in the legacy console.
That would be because it’s almost impossible to write an RTL application on terminals that don’t work this way. Give it a try. See if you can write some basic RTL applications on one of those terminals that reorders RTL characters. Like a simple RTL form entry system, or something that pops up a dialog or drop-down menu over existing RTL text. Maybe I’m just an idiot, but I can’t see how you can make that work, but it’s fairly straightforward on terminals that leave RTL characters exactly where you put them.
Yes, this as is:
תחא
Moreover:
Pulled triage. Sorry @j4james, I’ve been snowed in on e-mail as I had to leave to take care of some family stuff. Thanks for the first pass, Mike. d