terminal: Rendering performance of chafa is very slow

sudo apt install chafa curl https://media.giphy.com/media/12UwsVgQCYL3H2/giphy.gif --output winanim.gif chafa winanim.gif --font-ratio 1/3

Edit: On Ubuntu 18.04, follow directions here to get chafa sources and build 'em: https://hpjansson.org/chafa/

Edit2: after doing a ./configure and sudo apt install loop as you realize stuff is missing, you’ll get all the way through and it will whine about not being able to find the lib. Do ldconfig and it will shut up.

Edit3:

ubuntu
cd ~
mkdir chafa
cd chafa
curl https://hpjansson.org/chafa/releases/chafa-1.4.0.tar.xz --output chafa.tar.xz
tar xf chafa.tar.xz chafa-1.4.0/
cd chafa-1.4.0/
sudo apt install gcc pkg-config libgtk2.0-dev libmagickwand-dev
sudo ldconfig
./configure
make
sudo make install

About this issue

Original URL
State: closed
Created 5 years ago
Reactions: 4
Comments: 23 (15 by maintainers)

Commits related to this issue

win, tty: improve SIGWINCH performance Continuing improvement of SIGWINCH from PR #2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of ... — committed to JaneaSystems/libuv by bzoz 5 years ago
win, tty: improve SIGWINCH performance Continuing improvement of SIGWINCH from PR #2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of ... — committed to JaneaSystems/libuv by bzoz 5 years ago
win, tty: improve SIGWINCH performance Continuing improvement of SIGWINCH from PR #2308. Running SetWinEventHook without filtering for the specific PIDs has significant impact on the performance of ... — committed to libuv/libuv by bzoz 5 years ago
Introduce til::rle - a run length encoded vector * #8000 - Supports buffer rewrite work. A re-use of `til::rle` will be useful as a column counter as we pursue NxM storage and presentation. * #3075... — committed to microsoft/terminal by lhecker 3 years ago
Introduce til::rle - a run length encoded vector ## Summary of the Pull Request Introduces `til::rle`, a vector-like container which stores elements of type T in a run length encoded format. This al... — committed to microsoft/terminal by lhecker 3 years ago
Introduce til::rle - a run length encoded vector ## Summary of the Pull Request Introduces `til::rle`, a vector-like container which stores elements of type T in a run length encoded format. This al... — committed to microsoft/terminal by lhecker 3 years ago
Introduce til::rle - a run length encoded vector ## Summary of the Pull Request Introduces `til::rle`, a vector-like container which stores elements of type T in a run length encoded format. This al... — committed to microsoft/terminal by lhecker 3 years ago
Squashed commit of the following: commit 4b0eeef949b62d5c431eaf5d1374302b1362b6fe Author: Leonard Hecker <lhecker@microsoft.com> Date: Fri May 14 23:56:08 2021 +0200 Introduce til::rle - a run... — committed to microsoft/terminal by DHowett 3 years ago
Introduce til::rle - a run length encoded vector (#10099) ## Summary of the Pull Request Introduces `til::rle`, a vector-like container which stores elements of type T in a run length encoded for... — committed to microsoft/terminal by lhecker 3 years ago

Most upvoted comments

The major time spent on the WinEvent turns out to be only if Node.js is running on your system.

If Node.js is running, it registers for the WinEvent notifications for EVENT_CONSOLE_LAYOUT to know when the window size has changed. Given WinEvents require kernel work to broadcast and tend to be registered globally, this causes a system-wide slowdown of all of your consoles when it is listening here.

https://github.com/nodejs/node/blob/0109e121d3a2f87c4bad75ac05436b56c9fd3407/deps/uv/src/win/tty.c line 2294

If you kill all node.js runtimes (including the one that Visual Studio 2017 launches), that performance drag goes away.

I need to:

Coalesce MSAA events so they don’t happen so often.
Follow up with the WinEvent team to see if they can tell me Node.js is only listening for EVENT_CONSOLE_LAYOUT and not all the messages (because the expensive ones aren’t the layout messages, but because MSAA/WinEvent infrastructure is very old… registering for any one registers you for all of them.)
Probably formalize this into some sort of issue/bug on the Node team to stop doing this as they’re shooting the entire system’s performance in the foot to get a resize notification.

miniksa on Apr 19, 2019

WPR analysis:

11520ms in conhost.exe
3455ms in chafa

For conhost.exe…

5990ms on I/O thread (the likely culprit as it going slow will stop the other side from giving us data)
5183ms on Render thread (which might still be involved as it can lock the buffer when looking things up preventing I/O from writing)

On the I/O thread, hot areas include:

2007ms spent notifying accessibility eventing

Most of this (1405ms) spent inside user32.dll!NotifyWinEvent
This call causes a syscall/kernel transition which is SLOW
The best thing to do here is probably detect that no one needs the event and not transmit it OR
Transmit it less often by coalescing the accessibility events into frames much like the renderer

1196ms spent adjusting the cursor position

This is mostly attributable (823ms) to figuring out whether the cursor is sitting on top of a 2-column-wide character (so it can move 2 spaces right instead of 1). It looks like we’re doing this the wrong way and wasting time here since _lookupIsWide is called below for another purpose and retrieving the same information and that is also taking a lot of time.
236ms also spent in kernelbase.dll!SetEvent to trigger the render thread (probably can’t avoid, need kernel object to notify a potentially sleeping thread…)

877ms spent run-length-encoding colors

634ms spent in vector reallocation for holding the run-length-encoded colors. This could maybe be re-strategized to leave a bit of excess memory usage around in exchange for not reallocing so hard.

608ms spent looking up the narrow/wideness of characters (_lookupIsWide) during insertion

This is actually backending on gdi32full.dll!GetCharABCWidthsW versus the current font to figure out how wide it is going to be when we don’t otherwise know because it’s an ambiguous width character.

On the render thread, hot areas include:

4386ms in gdi32full.dll!PolyTextOutW

I don’t think there’s anything to be done here. I think this is just the consequence of trying to emit a ton of text really fast

I didn’t do a wait chain analysis yet to see if the locking/threading was slowing things down because at this point, we have a few areas with obvious routes to improvement that might alleviate the whole deal:

Therefore, my conclusion is:

Cursor movement shouldn’t be looking up the column count by the character, it should use the already known cell width value
The accessibility events need to be coalesced
We should be caching the queries to GDI for ambiguous character widths until the font changes
Investigate reducing vector reallocs in color run length encoding manipulation

And I have now filed MSFT:21167256 to do these things at some point and hopefully we’ll have fixed the performance issue.

miniksa on Dec 13, 2021

Also @oising let’s leave this here: https://github.com/hzeller/timg/

miniksa on Mar 29, 2021

The first quick fix for this (GDI measurement caching) just went out with insider build 18932!

DHowett-MSFT on Jul 3, 2019