pip: 'pip inspect' crashes with python-3.10.10 and pip-23.0.0 on Windows 11 (an utf-8 problem)

Description

trying pip inspect >> text.txt on my big list of wheels, I get the following crash:

pip install yarl==1.7.2
pip inspect>>test.txt
exit_buffer
    self._check_buffer()
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\console.py", line 2024, in _check_buffer
    legacy_windows_render(buffer, LegacyWindowsTerm(self.file))
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\_windows_renderer.py", line 17, in legacy_windows_render
    term.write_styled(text, style)
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\_win32_console.py", line 442, in write_styled
    self.write_text(text)
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\site-packages\pip\_vendor\rich\_win32_console.py", line 403, in write_text
    self.write(text)
  File "C:\WinP\bd310\budot\WPy32-310100b1\python-3.10.10\lib\encodings\cp1252.py", line 19, in encode
    return codecs.charmap_encode(input,self.errors,encoding_table)[0]
UnicodeEncodeError: 'charmap' codec can't encode characters in position 1843-1846: character maps to <undefined>

Expected behavior

no crash

pip version

pip-23.0.0

Python version

cpython-3.10.10

OS

Windows11

How to Reproduce

you may try on windows

pip install yarl==1.7.2
pip inspect>>test.txt

I was using this list of packages:

pip list Package Version


idna 3.1 msvc-runtime 14.32.31326 multidict 6.0.2 pip 23.0 setuptools 67.2.0 simpy 4.0.1 sqlite-bro 0.12.2 wheel 0.38.4 winpython 6.0.20230212 yarl 1.7.2

Output

Code of Conduct

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 17 (14 by maintainers)

Commits related to this issue

Most upvoted comments

I wonder if it would make sense to provide a way to configure Rich to use errors="replace" (and maybe other modes). For this case specifically correctness isn’t that necessary and pip could simply skip the unencodable characters, so such flags could be useful, but this may not be the universal preference.

The root problem is that Python uses _io.FileIO wrapper for I/O streams (stdout) that are piped to a file or other commands. By default, FileIO uses an “ASCII compatible” encoding (defined by the System’s ANSI codepage). When stdour is connected to a console, the new wrapper _io.WindowsConsoleIO is used with utf8 as default. Please read PEP 528 for explanation.

Testing in Powershell 5.1 of Windows 11:

PS C:\Users> python -c "import sys;print(sys.stdout.encoding,type(sys.stdout.buffer.raw))"
# Output: utf-8 <class '_io._WindowsConsoleIO'>

# Redirect stdout to file or another cmdlet
PS C:\Users> python -c "import sys;print(sys.stdout.encoding,type(sys.stdout.buffer.raw))"  | %{echo $_}
# Output: cp1252 <class '_io.FileIO'>

# Assign to PS variable
PS C:\Users> $msg = python -c "import sys;print(sys.stdout.encoding)" ; echo $msg
# Output: cp1252

Testing in CMD of Windows 11:

C:\Users>  python3.10 -c "import sys;print(sys.stdout.encoding)" | findstr /R /C:".*"
cp1252

The most innocent pip command will fail on Unicode-path due to this bug:

(new_venv) PS D:\tmp\ひらがな> $pipinfo = pip show pip
--- Logging error ---
Traceback (most recent call last):
  ...
  File "D:\tmp\ひらがな\new_venv\lib\site-packages\pip\_vendor\rich\console.py", line 1999, in _check_buffer
    legacy_windows_render(

You will notice that setting PYTHONIOENCODING="utf8" is not enough:

(new_venv) PS D:\tmp\ひらがな> $env:PYTHONIOENCODING="utf8"
(new_venv) PS D:\tmp\ひらがな> echo (pip show pip)
...
Location: d:\tmp\ひらがな\new_venv\lib\site-packages

That’s because windows console use UTF-16-LE by default.

The tested python distribution comes from MS APP Store, although I don’t think it matters.

The python team has known this issue for a long time and introduced the “UTF-8” mode in PEP 540 in 2016. However this mode is not and will not be default in near future because of those reasons mentioned in the PEP. Thus, it is the users’ responsibility to set the correct encoding and the libraries’ responsibility to remind the users to do so.

So, it would be enough for pip to generate a warning/error message asking the user to set PYTHONIOENCODING=utf16 or use -X utf8=1 or set PYTHONUTF8=1 to enforce UTF8 mode.

This is a reoccurring class of problem on Windows, not exclusively a Rich issue. Essentially sys.stdout.encoding is reporting “CP-1252”, apparently incorrectly (and I don’t know why). You would likely get the same error with a plain old print.

Rich dutifully uses the reported encoding when writing output, and you get an encoding error. Setting the env var PYTHONIOENCODING="utf-8" may fix it.

Frustratingly, Rich has done the right thing here by respecting the encoding. But its not great for dev or users.

Rich could set errors="replace" which means that you would get question marks in place of some characters. Rich could also force encoding to “utf-8” which may work around any misconfiguration on the environment. Both changes risk causing issues for others.

@pfmoore Do you have any insight on the best way for Rich to tackle this?