ksh: Autocomplete should not fill partial multibyte characters

If I have files 'XXXá' and 'XXXë' ($'XXX\xc3\xa1' and $'XXX\xc3\xab') and have typed XX as a command argument, autocomplete should on the first Tab append X to show XXX. What actually happens is that autocomplete attempts to complete to $'XXX\xc3', since the first byte of á and ë is the same, displaying XXX^ and leaving the editor in a bad state where subsequent keypresses can move to before the start of the line.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 21

Commits related to this issue

Most upvoted comments

Answering the question asked here , I think if you do that, you violate least surprise. The expectation is that what I typed, was used as the match.

Probably not one person in 100 ever thinks about the underlying filesystem being case-insensitive, ESPECIALLY if it is case-preserving, as HFS+ and APFS are… or whack ideas like NTFS where the underlying filesystem is case-sensitive but the API calls are not.

macOS Catalina:

$ cat reproducer.sh
touch A
touch AB
touch AbC

echo A*
echo AB*
$ ksh reproducer.sh 
A AB AbC
AB

Mint Linux:

mint $ cat reproducer.sh 
touch A
touch AB
touch AbC

echo A*
echo AB*
mint $ ksh reproducer.sh 
A AB AbC
AB

I would not expect the second echo to produce AB AbC on macOS and not on Mint, and I know perfectly well that one of those is case-insensitive and the other is not.

Worse would be if the behavior was different.

It’s an issue of “correct” vs “right”.

Is this a crazy idea?

It’s probably the only the portable-ish way of doing it and should give accurate results, so not crazy. There may be different platform-specific solutions that you can use instead though that might be good enough. A search brings up that pathconf can take _PC_CASE_SENSITIVE on macOS or _PC_CASE_INSENSITIVE on Cygwin, though I have not tested these.

I think that’s fine, I think the bug is actually at https://github.com/ksh93/ksh/blob/14352ba0a7383151b9503757ae6b8838b57e7000/src/cmd/ksh93/edit/completion.c#L81-L82 If I change this to

       register const char *strnext;
       while((strnext=str,c= mbchar(strnext)) && (d= mbchar(newstr),charcmp(c,d,nocase)))
               str=strnext;

then the problem is avoided. However, charcmp is not designed to be called with anything other than unsigned chars converted to int so this will need more work.