bat: `man` syntax doesn't highlight bold functions correctly

Terminals tested: alacritty, mate-terminal, urxvt

bat --version: 0.12.0 (Installed via cargo install bat)

$MANPAGER: bat --paging=never -pl man [1] [2]

[1]: I disabled paging to make sure it’s not a problem with less(1). [2]: The documentation suggests setting MANPAGER to sh -c "col -b | bat -pl man" however I found using col actually just garbled the output even more, see screenshot further down.

Output with MANPAGER='bat -pl man' image

Output with MANPAGER='' or MANPAGER='less' image

The issue seems to be with highlighting functions / page references (foo(…)) when bold output is used.

When using col -b as suggested, it becomes even worse:

Output with MANPAGER='sh -c "col -b | bat -pl man"' image

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 11
  • Comments: 33 (24 by maintainers)

Commits related to this issue

Most upvoted comments

Well.

MANROFFOPT="-c" MANPAGER="sh -c 'col -bx | bat -plman'" man sprintf Finally worked. No bold or underlined text, but it finally displays correctly 😄

While this presents a working solution for now, I’d suggest either keeping this issue open, or opening a new one, as this is rather hacky. (although it was fun learning experience about the joys of old unix tech!)

image

for me, working on fedora 35 export MANROFFOPT="-c" helped Thankyou @xeruf @LunarLambda

Okay, phew! I dug in a little more and got a usable sed command, but unfortunately there still seems to be an issue with --language Manpage even using ANSI codes instead of overstrike.

Here’s the command I’m using:

sed=gsed # needed on macOS it seemzs
# sed=sed # linux

export MANPAGER="$sed -E 's/(.)\x08\1/\x1b[1m\1\x1b[22m/g' |
	$sed -E 's/_\x08(.)/\x1b[4m\1\x1b[24m/g' |
	bat -p"
man sprintf

This displays non-colored but correctly decorated pages, as you might expect! less, cat etc. should also work here.

Screen Shot 2023-01-13 at 09 30 55

However, when using bat --language Manpage, it seems the color of the syntax highlight gets garbled with the bold/underline codes, similar to the OP report:

export MANPAGER="$sed -E 's/(.)\x08\1/\x1b[1m\1\x1b[22m/g' | 
	$sed -E 's/_\x08(.)/\x1b[4m\1\x1b[24m/g' |
	bat -plman"
man sprintf

Screen Shot 2023-01-13 at 09 29 54

Is it expected that bat would correctly handle the syntax highlighting intermingled with the source data having control characters? If so, I’d propose that as the actionable item here, and have it be the user’s responsibility to ensure the input manpage data is “normalized” (i.e. using all ANSI or all overstrike decorations). Thoughts?

update from my side: Using nvim/emacs as man viewer now as these can follow links as well 😉

As piping man to bat works for me, unlike using man pager, fish users may try the following:

function m --wraps man
  man $argv | bat -pl man
end

In case it helps someone

Program versions

Arch Linux man 2.9.4 col from util-linux 2.37 bat 0.18.1

Comparison

MANPAGER='less' man printf

image

MANPAGER='bat -pl man' man printf

image

MANPAGER="sh -c 'col -bx | bat -pl man'" man printf

image

Neither MANROFFOPT nor adding/removing -b for col seem to change anything for me.

Conclusion

Adding colors is nice, but since bat right now does not display the essential highlightings, I am considering to switch back to less or find an interactive man viewer where I can follow links.

You should mention in the README that bold highlighting is unsupported - I was quite confused, and this issue doesn’t really go into that.

I have col from util-linux 2.33.2.

Unfortunately MANPAGER='sh -c "col -bx | bat -plman"' man sprintf yields the following

image

Thank you for the detailed bug report!

I’m going to assume that you are using man sprintf in your examples(?).

To figure out what’s going on in detail, we can actually use bat -A to show what exactly man outputs:

MANPAGER="bat -A" man sprintf

After finding the corresponding section, we can take a look at how man prints bold text. It is both fascinating and infuriating. Instead of using ANSI escape sequences, it prints

p␈pr␈ri␈in␈nt␈tf␈f

for a bold printf (bat -A shows instead of the \b backspace character). I believe this is how “bold” was done in the times of typewriters. You would hit backspace and then just re-type the same character to give it more weight.

On todays terminal emulators, that doesn’t actually work. If you use MANPAGER="" or MANPAGER="cat", no bold text will be shown. To make sure, we can also call

printf "p\bpr\bri\bin\bnt\btf\bf\n"

which will just print printf on the terminal.

Interestingly, less has a special feature that shows such sequences in bold. Quoting from man less: “Also, backspaces which appear between two identical characters are treated specially: the overstruck text is printed using the terminal’s hardware boldface capability. Other backspaces are deleted, along with the preceding character”. This is why we see a bold face printf, when we call

printf "p\bpr\bri\bin\bnt\btf\bf\n" | less

There is also a similar feature for underlined text:

printf "p\b_r\b_i\b_n\b_t\b_f\b_\n" | less

Back to bat. When I initially played with this, I noticed that these backspace characters were causing problems when intermixed with bats syntax highlighting. Imagine we have

int printf(const char* format, ...);

in a man page and the whole line is printed in bold (beginning of man sprintf). The syntax highlighter will try to highlight certain special characters like the opening parenthesis (. However, that breaks the backspace-for-bold-font-trick and actual backspace characters will start appearing in your output.

For this reason, I originally used col -b (col --no-backspaces), which turns something like "p\bpr\bri\bin\bnt\btf\bf into printf:

▶ printf "p\bpr\bri\bin\bnt\btf\bf\n" | bat -Ap         
p␈pr␈ri␈in␈nt␈tf␈f␊

▶ printf "p\bpr\bri\bin\bnt\btf\bf\n" | col -b | bat -Ap
printf␊

Unfortunately, I missed that col -b “also replaces any whitespace characters with tabs where possible”. This is what breaks the table layout in the above example. Fortunately, we can switch this off via cols -x/--spaces option.

The following works for me:

MANPAGER="sh -c 'col -bx | bat -p -lman'" man sprintf

image

I think we should update the instructions in the README to suggest col -bx.

Unfortunately, it looks like your col command does things a little differently. I couldn’t exactly reproduce your screenshots above. My version is:

▶ col --version 
col from util-linux 2.34

On macOS this happens if you use the man binary provided by brew’s man-db package. I don’t remember why I added it, so brew uninstall man-db brought me back to using the system man implementation, which is more well-behaved about escape sequences.

Not sure if that’s viable for anybody else, but removing it was a huge QoL improvement for me (back to bat’s highlighting, and no more broken escapes written in my manpages), so I figured I’d mention it here in case someone else in the same situation hits it.

Example of what the brokenness looked like, since it doesn't quite seem the same as the others, although it's basically the same problem.

(Before)

LOCATE(1)                                BSD General Commands Manual                                LOCATE(1)

1mNAME0m
     1mlocate 22m— find filenames quickly

1mSYNOPSIS0m
     1mlocate 22m[1m-0Scims22m] [1m-l 4m22mlimit24m] [1m-d 4m22mdatabase24m] 4mpattern24m 4m...0m

1mDESCRIPTION0m
     The 1mlocate 22mprogram searches a database for all pathnames which match the specified 4mpattern24m.  The data‐

(After)

LOCATE(1)                                   General Commands Manual                                  LOCATE(1)

NAME
     locate – find filenames quickly

SYNOPSIS
     locate [-0Scims] [-l limit] [-d database] pattern ...

DESCRIPTION
     The locate program searches a database for all pathnames which match the specified pattern.  The database
     is recomputed periodically (usually weekly or daily), and contains the pathnames of all files which are
     publicly accessible.

Both had some amount of bat highlighting, but with the extra text it was just unreadable before.