nushell: Files with Non-UTF8 characters are simply ignored in `ls`
Describe the bug
Files with Non-UTF8 characters are simply ignored in ls.
How to reproduce
- Create a file named
''$'\001\b\327''@'$'\310\320\f''@8'. Spelling and quoting is set from the point of view how Bash views this file’s name. lswith Nu.- See only normal files. This file is not displayed, at all.
lswith Bash.- File is shown.
Expected behavior
Can view this file in some form or the other. Not even necessary to display all characters properly, I just want to view the file name, according to how Nu sees it, so I can rm it.
Screenshots
No response
Configuration
| key | value |
|---|---|
| version | 0.87.0 |
| branch | |
| commit_hash | 77a1c3c7b2f3a110d48bcb792968e6b0d85d4bb7 |
| build_os | linux-x86_64 |
| build_target | x86_64-unknown-linux-gnu |
| rust_version | rustc 1.71.1 (eb26296b5 2023-08-03) |
| rust_channel | 1.71.1-x86_64-unknown-linux-gnu |
| cargo_version | cargo 1.71.1 (7f1d04c00 2023-07-29) |
| build_time | 2023-11-14 20:18:44 +00:00 |
| build_rust_channel | release |
| allocator | mimalloc |
| features | dataframe, default, extra, sqlite, static-link-openssl, trash, which, zip |
| installed_plugins |
Additional context
I accidentally created this file, saw it in VS Code.
Using Nu to list the file did not work out, at all. Then I stat the file, switched to Bash, rann ls there and voilà, there is that file. Removed it inside Bash.
Wasn’t able to do it with Nu, because I could not even list, i.e. “see”, it, which also means, I wasn’t able to delete it.
About this issue
- Original URL
- State: open
- Created 7 months ago
- Comments: 17 (3 by maintainers)
I could reproduce this using
touch "$(echo -ne "\xff\xff")"in bash. Thenlsin bash shows''$'\377\377', while nu giveswarning: get non-utf8 filename "/tmp/test/\xFF\xFF", ignored.Note that Rust uses OsString to represent paths, which can handle invalid UTF-8.
For comparison,
os.listdir(".")in Python returns['\udcff\udcff'], ie. it represents each invalid byte as a codepoint in the unicode surrogate block. When you want to open'\udcff\udcff'it just removes the prefix\udc, and this seems to be non-ambiguous because a surrogate codepoint does not have a valid UTF-8 representation. So this idea could be an option for nu if we don’t want to introduce a separate datatype for paths.Nope. I’m not motivated enough to install docker or scuba.