pip: 'pip inspect' crashes with python-3.10.10 and pip-23.0.0 on Windows 11 (an utf-8 problem)
Description
trying pip inspect >> text.txt
on my big list of wheels, I get the following crash:
pip install yarl==1.7.2
pip inspect>>test.txt
exit_buffer
self._check_buffer()
File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\console.py", line 2024, in _check_buffer
legacy_windows_render(buffer, LegacyWindowsTerm(self.file))
File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\_windows_renderer.py", line 17, in legacy_windows_render
term.write_styled(text, style)
File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\_win32_console.py", line 442, in write_styled
self.write_text(text)
File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\_win32_console.py", line 403, in write_text
self.write(text)
File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\encodings\cp1252.py", line 19, in encode
return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1843-1846: character maps to <undefined>
Expected behavior
no crash
pip version
pip-23.0.0
Python version
cpython-3.10.10
OS
Windows11
How to Reproduce
you may try on windows
pip install yarl==1.7.2
pip inspect>>test.txt
I was using this list of packages:
pip list Package Version
idna 3.1 msvc-runtime 14.32.31326 multidict 6.0.2 pip 23.0 setuptools 67.2.0 simpy 4.0.1 sqlite-bro 0.12.2 wheel 0.38.4 winpython 6.0.20230212 yarl 1.7.2
Output
Code of Conduct
- I agree to follow the PSF Code of Conduct.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 17 (14 by maintainers)
Commits related to this issue
- don't break 'pip inspect' see https://github.com/pypa/pip/issues/11798 — committed to stonebig/rich by stonebig a year ago
I wonder if it would make sense to provide a way to configure Rich to use
errors="replace"
(and maybe other modes). For this case specifically correctness isn’t that necessary and pip could simply skip the unencodable characters, so such flags could be useful, but this may not be the universal preference.The root problem is that Python uses
_io.FileIO
wrapper for I/O streams (stdout
) that are piped to a file or other commands. By default,FileIO
uses an “ASCII compatible” encoding (defined by the System’s ANSI codepage). When stdour is connected to a console, the new wrapper_io.WindowsConsoleIO
is used with utf8 as default. Please read PEP 528 for explanation.Testing in Powershell 5.1 of Windows 11:
Testing in CMD of Windows 11:
The most innocent pip command will fail on Unicode-path due to this bug:
You will notice that setting
PYTHONIOENCODING="utf8"
is not enough:That’s because windows console use UTF-16-LE by default.
The tested python distribution comes from MS APP Store, although I don’t think it matters.
The python team has known this issue for a long time and introduced the “UTF-8” mode in PEP 540 in 2016. However this mode is not and will not be default in near future because of those reasons mentioned in the PEP. Thus, it is the users’ responsibility to set the correct encoding and the libraries’ responsibility to remind the users to do so.
So, it would be enough for pip to generate a warning/error message asking the user to set
PYTHONIOENCODING=utf16
or use-X utf8=1
or setPYTHONUTF8=1
to enforce UTF8 mode.This is a reoccurring class of problem on Windows, not exclusively a Rich issue. Essentially
sys.stdout.encoding
is reporting “CP-1252”, apparently incorrectly (and I don’t know why). You would likely get the same error with a plain oldprint
.Rich dutifully uses the reported encoding when writing output, and you get an encoding error. Setting the env var
PYTHONIOENCODING="utf-8"
may fix it.Frustratingly, Rich has done the right thing here by respecting the encoding. But its not great for dev or users.
Rich could set
errors="replace"
which means that you would get question marks in place of some characters. Rich could also force encoding to “utf-8” which may work around any misconfiguration on the environment. Both changes risk causing issues for others.@pfmoore Do you have any insight on the best way for Rich to tackle this?