pipx: Traceback when locale not set to UTF-8

On a machine where the locale has not been set to UTF-8 you get tracebacks when installing/running pipx. E.g. when installing:

Traceback (most recent call last):
  File "get-pipx.py.1", line 287, in <module>
    main()
  File "get-pipx.py.1", line 283, in main
    print(f"Enjoy! {'\u2728 \U0001f31f \u2728' if not WINDOWS else ''}")
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2728' in position 7: ordinal not in range(256)

Or when installing a package:

ipx install poetry
zsh: correct 'pipx' to 'popd' [nyae]? n
2nstalling package 'poetry'       
  installed package poetry 0.12.11, Python 3.6.7
  These binaries are now globally available
    - poetry
Traceback (most recent call last):
  File "/home/pmav2/.local/bin/pipx", line 10, in <module>
    sys.exit(cli())
  File "/home/pmav2/.local/pipx/venvs/pipx-app/lib/python3.6/site-packages/pipx/main.py", line 932, in cli
    run_pipx_command(args)
  File "/home/pmav2/.local/pipx/venvs/pipx-app/lib/python3.6/site-packages/pipx/main.py", line 642, in run_pipx_command
    force=args.force,
  File "/home/pmav2/.local/pipx/venvs/pipx-app/lib/python3.6/site-packages/pipx/main.py", line 505, in install
    print(f"done! {stars}")
UnicodeEncodeError: 'latin-1' codec can't encode character '\u2728' in position 6: ordinal not in range(256)

Obviously, the problem is the special characters in the print statements. My guess is that you could check sys.getdefaultencoding() before you actually try to print them.

If you want to test this, on ubuntu you can set an “ansi” encoding with:

sudo update-locale LANG=en_US

and restore utf-8 with:

sudo update-locale LANG=en_US.UTF-8

You probably need to open a new terminal after each change though.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 18 (11 by maintainers)

Most upvoted comments

It looks like it’s possible to avoid these issues by disabling emojis like this:

USE_EMOJI=false python3 -m pipx ensurepath

I filed #619 to mention a pip upgrade may be required to install pipx.

Ah, that’s good to know that python3 -m pip --upgrade pip is an essential step for people.

Glad it worked, thanks for testing @tucked.

What about the dev / build was confusing or frustrating?

Wow, pipx dev/build is wonky… figured it out eventually, though (I think), and @itsayellow’s branch seems to resolve the issue!

$ LC_ALL= LANG= pipx ensurepath
/home/tucked/.local/bin is already in PATH.
/home/tucked/.local/bin is already in PATH.

All pipx binary directories have been added to PATH. If you are sure
you want to proceed, try again with the '--force' flag.

'ascii' codec can't encode character '\u2728' in position 31: ordinal not in range(128)
$ LC_ALL= LANG= python3 ~/projects/pipx/src/pipx ensurepath
/home/tucked/.local/bin is already in PATH.

All pipx binary directories have been added to PATH. If you are sure you want to proceed, try again with the '--force' flag.

Otherwise pipx is ready to go!

So by checking sys.getdefaultencoding() we are not necessarily verifying that we can successfully print unicode characters.

sys.getdefaultencoding() only affects the behaviour when you do str.encode() and bytes.decode() without the encoding argument (or has it set to None). IIRC it is a compile-time constant and is almost always set to utf-8 these days (even on Windows). So it is mostly irralevent in this context.

I’ll give it a shot soon (tomorrow?). I’m having trouble replicating the environment this weekend that a co-worker was using.

I made an improved version of our emoji platform-checking code that hopefully fixes these issues. Could somebody with emoji problems please test the pipx version in: https://github.com/itsayellow/pipx/tree/unicode-check …and let me know how it goes.

e.g.

git clone --branch unicode-check https://github.com/itsayellow/pipx

Ah ha! I finally figured out how to both generate a new locale, and then use it to test. This is my result:

> sudo locale-gen en_US
Generating locales (this might take a while)...
  en_US.ISO-8859-1... done
Generation complete.
> LANG=en_US python3 -c "import sys; print(sys.getdefaultencoding()); print(sys.stdout.encoding); print(sys.stderr.encoding)"
utf-8
iso8859-1
iso8859-1

The point being: sys.getdefaultencoding() is not the same as sys.stdout.encoding or sys.stderr.encoding.

So by checking sys.getdefaultencoding() we are not necessarily verifying that we can successfully print unicode characters.