pex: python<=3.8 symlink with a suffix (eg `3.7m`) can create a venv without a `pythonX.Y` symlink which breaks pex assumptions that `pythonX.Y` is always available

I get this error when running ./pants version on pants 2.16.x branch.

Pants is using pex v2.1.131 because I have overriden [pex-cli].version = "v2.1.131" and [ex-cli].known_versions in ~/.pants.rc

$ ./pants --keep-sandboxes=always version
11:48:27.51 [INFO] waiting for pantsd to start...
11:48:30.01 [INFO] pantsd started
11:48:30.13 [INFO] Preserving local process execution dir /tmp/pants-sandbox-tP7l8W for Searching for `bash` on PATH=/usr/bin:/bin:/usr/local/bin
11:48:30.14 [INFO] Preserving local process execution dir /tmp/pants-sandbox-X9MYvB for Test binary /bin/bash.
11:48:30.16 [INFO] Starting: Resolving plugins: hdrhistogram, toolchain.pants.plugin==0.27.0
11:48:30.16 [INFO] Preserving local process execution dir /tmp/pants-sandbox-M3MXvy for Resolving plugins: hdrhistogram, toolchain.pants.plugin==0.27.0
11:48:32.16 [INFO] Completed: Resolving plugins: hdrhistogram, toolchain.pants.plugin==0.27.0
11:48:32.16 [ERROR] 1 Exception encountered:

[snip]

pants.engine.process.ProcessExecutionFailure: Process 'Resolving plugins: hdrhistogram, toolchain.pants.plugin==0.27.0' failed with exit code 1.
stdout:

stderr:
Failed to spawn a job for /home/cognifloyd/.cache/pants/pants_dev_deps/Linux.x86_64.Intel(R).Core(TM).i7-3610QM.CPU.@.2.30GHz.py37.venv/bin/python3.7: pid 1531475 -> /home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/80f046e10efd2c6688f5a7df4d079c8b9816ab6f/acea27d2dbcc288eac2b093081739a3575b60708/bin/python -sE /home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/80f046e10efd2c6688f5a7df4d079c8b9816ab6f/acea27d2dbcc288eac2b093081739a3575b60708/pex --disable-pip-version-check --no-python-version-warning --exists-action a --no-input --isolated -q --cache-dir /home/cognifloyd/.cache/pants/named_caches/pex_root/pip_cache --log /tmp/pants-sandbox-M3MXvy/.tmp/pex-pip-log.4f462xf5/pip.log download --dest /home/cognifloyd/.cache/pants/named_caches/pex_root/downloads/resolver_download.ho5or1_z/home.cognifloyd..cache.pants.pants_dev_deps.Linux.x86_64.Intel(R).Core(TM).i7-3610QM.CPU.@.2.30GHz.py37.venv.bin.python3.7 pip==23.0.1 setuptools==67.4.0 wheel==0.38.4 --index-url https://pypi.org/simple/ --retries 5 --timeout 15 exited with 1 and STDERR:
Re-execing from /home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/80f046e10efd2c6688f5a7df4d079c8b9816ab6f/acea27d2dbcc288eac2b093081739a3575b60708/bin/python
Traceback (most recent call last):
  File "/home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/80f046e10efd2c6688f5a7df4d079c8b9816ab6f/acea27d2dbcc288eac2b093081739a3575b60708/pex", line 50, in <module>
    os.execv(python, argv)
FileNotFoundError: [Errno 2] No such file or directory

Looking in /home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/80f046e10efd2c6688f5a7df4d079c8b9816ab6f/acea27d2dbcc288eac2b093081739a3575b60708/pex, it lists a python binary that does not exist.

$ cd /home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/80f046e10efd2c6688f5a7df4d079c8b9816ab6f/acea27d2dbcc288eac2b093081739a3575b60708/
$ ls -l pex __main__.py bin/python*
lrwxrwxrwx 1 cognifloyd cognifloyd    10 Apr 17 12:04 bin/python -> python3.7m
lrwxrwxrwx 1 cognifloyd cognifloyd    10 Apr 17 12:04 bin/python3 -> python3.7m
lrwxrwxrwx 1 cognifloyd cognifloyd    19 Apr 17 12:04 bin/python3.7m -> /usr/bin/python3.7m
-rwxr-xr-x 1 cognifloyd cognifloyd 11256 Apr 17 12:04 __main__.py
lrwxrwxrwx 1 cognifloyd cognifloyd    11 Apr 17 12:04 pex -> __main__.py

The shebang python is wrong in __main__.py because it is using python3.7 when the virtualenv contains python3.7m:

#!/home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/s/0a5a863d/venv/bin/python3.7 -sE
...
    shebang_python = '/home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/s/0a5a863d/venv/bin/python3.7'

Debugging results

Using --keep-sandboxes=always I got into the pants sandbox for pex to see what’s going on: /tmp/pants-sandbox-M3MXvy

$ cd /tmp/pants-sandbox-M3MXvy
$ ./pex --version
2.1.131

Here is the full pex command line pants is running (__run.sh):

#!/bin/bash
# This command line should execute the same process as pants did internally.
export CPPFLAGS= LANG=en_US.UTF-8 LDFLAGS= PATH=$'/home/cognifloyd/.cache/pants/pants_dev_deps/Linux.x86_64.Intel(R).Core(TM).i7-3610QM.CPU.@.2.30GHz.py37.venv/bin:/home/cognifloyd/.cargo/bin:/home/cognifloyd/g/github/pyenv/pyenv.git/shims:/home/cognifloyd/.local/bin:/home/cognifloyd/.local/npm/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/home/cognifloyd/p/gcloud/google-cloud-sdk/bin:/opt/android-sdk-update-manager/tools:/opt/android-sdk-update-manager/platform-tools:/opt/nvidia-cg-toolkit/bin:/home/cognifloyd/g/github/pyenv/pyenv.git/bin:/home/cognifloyd/go/bin' PEX_IGNORE_RCFILES=true PEX_PYTHON=$'/home/cognifloyd/.cache/pants/pants_dev_deps/Linux.x86_64.Intel(R).Core(TM).i7-3610QM.CPU.@.2.30GHz.py37.venv/bin/python' PEX_ROOT=.cache/pex_root
cd /tmp/pants-sandbox-M3MXvy
$'/home/cognifloyd/.cache/pants/pants_dev_deps/Linux.x86_64.Intel(R).Core(TM).i7-3610QM.CPU.@.2.30GHz.py37.venv/bin/python' ./pex --tmpdir .tmp --jobs 2 --pip-version 23.0.1 --python-path $'/home/cognifloyd/g/github/pyenv/pyenv.git/versions/3.4.10/bin:/home/cognifloyd/.cache/pants/pants_dev_deps/Linux.x86_64.Intel(R).Core(TM).i7-3610QM.CPU.@.2.30GHz.py37.venv/bin:/home/cognifloyd/.cargo/bin:/home/cognifloyd/g/github/pyenv/pyenv.git/shims:/home/cognifloyd/.local/bin:/home/cognifloyd/.local/npm/bin:/usr/local/bin:/usr/bin:/bin:/opt/bin:/home/cognifloyd/p/gcloud/google-cloud-sdk/bin:/opt/android-sdk-update-manager/tools:/opt/android-sdk-update-manager/platform-tools:/opt/nvidia-cg-toolkit/bin:/home/cognifloyd/g/github/pyenv/pyenv.git/bin:/home/cognifloyd/go/bin' --output-file pants_plugins.pex --no-emit-warnings --venv --seed verbose --no-venv-site-packages-copies --python $'/home/cognifloyd/.cache/pants/pants_dev_deps/Linux.x86_64.Intel(R).Core(TM).i7-3610QM.CPU.@.2.30GHz.py37.venv/bin/python' $'--sources-directory=source_files' hdrhistogram $'toolchain.pants.plugin==0.27.0' --no-pypi $'--index=https://pypi.org/simple/' --manylinux manylinux2014 --resolver-version pip-2020-resolver --constraints __constraints.txt --layout packed

So, pex is using the pants bootstrap venv, the interpreters available are:

$ cd $'/home/cognifloyd/.cache/pants/pants_dev_deps/Linux.x86_64.Intel(R).Core(TM).i7-3610QM.CPU.@.2.30GHz.py37.venv/bin'
$ ls -l python*
lrwxrwxrwx 1 cognifloyd cognifloyd  9 Apr 16 00:08 python -> python3.7
lrwxrwxrwx 1 cognifloyd cognifloyd  9 Apr 16 00:08 python3 -> python3.7
lrwxrwxrwx 1 cognifloyd cognifloyd 18 Apr 16 00:08 python3.7 -> /usr/bin/python3.7

python versions available on my machine

Here are all of the versions of python I have available:

$ ls -l /usr/bin/python3.[1-9]{,[0-9m]}
-rwxr-xr-x 1 root root 14304 Mar 15 12:46 /usr/bin/python3.10
-rwxr-xr-x 1 root root 14304 Mar 15 12:51 /usr/bin/python3.11
lrwxrwxrwx 1 root root    10 Aug  8  2022 /usr/bin/python3.6 -> python3.6m
-rwxr-xr-x 1 root root 14280 Aug  8  2022 /usr/bin/python3.6m
lrwxrwxrwx 1 root root    10 Aug  8  2022 /usr/bin/python3.7 -> python3.7m
-rwxr-xr-x 1 root root 14224 Aug  8  2022 /usr/bin/python3.7m
-rwxr-xr-x 1 root root 14304 Dec 12 04:09 /usr/bin/python3.8
-rwxr-xr-x 1 root root 14304 Mar 15 12:49 /usr/bin/python3.9

I have some older builds of python3.6 and python3.7 on my gentoo linux machine. 3.7 is the oldest that pants supports, so it is using that.

(If you’re familiar with gentoo: I have ebuilds for these in my local overlay that I copied from ::gentoo when they were dropped because I still needed 3.6 and 3.7)

These versions of python include the optional m abi tag described in https://peps.python.org/pep-3149/ (m means --with-pymalloc), so the binaries are python3.6m and python3.7m:

What’s wrong with the pex venv?

As noted above, the venv is getting built with bin/python3.7m but __main__.py tries to use it as bin/python3.7, which doesn’t exist.

Next I ran this to force a rebuild of the venv and see how it is getting built:

$ rm -rf ~/.cache/pants/named_caches/pex_root/
$ export PEX_VERBOSE=6
$ ./__run.sh

Sifting through the output, this is how it is creating the venv:

pex: Executing: /usr/bin/python3.7m -s -E -m venv --without-pip /home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/80f046e10efd2c6688f5a7df4d079c8b9816ab6f/acea27d2dbcc288eac2b093081739a3575b60708.lck.work --prompt pex

So, I ran a version of that myself to see how it creates the symlinks:

$ /usr/bin/python3.7m -s -E -m venv --without-pip /tmp/pex-py37m-venv --prompt pex
$ ls -l /tmp/pex-py37m-venv/bin/python*
lrwxrwxrwx 1 cognifloyd cognifloyd 10 Apr 17 13:01 /tmp/pex-py37m-venv/bin/python -> python3.7m
lrwxrwxrwx 1 cognifloyd cognifloyd 10 Apr 17 13:01 /tmp/pex-py37m-venv/bin/python3 -> python3.7m
lrwxrwxrwx 1 cognifloyd cognifloyd 19 Apr 17 13:01 /tmp/pex-py37m-venv/bin/python3.7m -> /usr/bin/python3.7m

So, it is the venv module that is creating the symlink as python3.7m.

The next log entry (below) shows that pex is using the correct symlink in that venv to finish creating it, so it’s only after the venv is populated and pex tries to reexecute that it gets lost.

pex: Executing: PYTHONPATH=/home/cognifloyd/.cache/pants/named_caches/pex_root/isolated/424a1c62c3a17be0bfeeb465d9221d7e2f0cb3ff /home/cognifloyd/.cache/pants/named_caches/pex_root/venvs/80f046e10efd2c6688f5a7df4d079c8b9816ab6f/acea27d2dbcc288eac2b093081739a3575b60708.lck.work/bin/python3.7m -s -c import os

Why is pex hard-coding python3.7 as the shebang_python instead of python3.7m?

I think pex is calculating the binary name instead of looking it up: https://github.com/pantsbuild/pex/blob/2534f1fd2d4fc9f2021b3a0dea9af757baccc027/pex/interpreter.py#L66-L75

Which eventually makes its way into the template here: https://github.com/pantsbuild/pex/blob/2534f1fd2d4fc9f2021b3a0dea9af757baccc027/pex/venv/pex.py#L343-L354

The --python option pointed to the pants venv python .../bin/python3.7 which had the symlink python3.7 instead of python3.7m. So, I’m not sure how that pants bootstrap venv is getting created, but it is probably not using -m venv or I would expect it to have a symlink with the same name (python3.7m).

I’m not sure where to fix handling/detection of abi-tagged python binaries.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (10 by maintainers)

Commits related to this issue

Most upvoted comments

Ok, the key to all of this is the following on your machine:

$ ls -l /usr/bin/python3.[1-9]{,[0-9m]}
...
lrwxrwxrwx 1 root root    10 Aug  8  2022 /usr/bin/python3.7 -> python3.7m
-rwxr-xr-x 1 root root 14224 Aug  8  2022 /usr/bin/python3.7m
...

It’s the symlink from 3.7 to 3.7m that the ./pants script picks out here to get the ball rolling: https://github.com/pantsbuild/pants/blob/f6147cee973ba7bd50e26dfdca6f3110986ad609/build-support/common.sh#L55-L72

That leads to a pants_dev_deps venv with bin dir like you showed above:

$ cd $'/home/cognifloyd/.cache/pants/pants_dev_deps/Linux.x86_64.Intel(R).Core(TM).i7-3610QM.CPU.@.2.30GHz.py37.venv/bin'
$ ls -l python*
lrwxrwxrwx 1 cognifloyd cognifloyd  9 Apr 16 00:08 python -> python3.7
lrwxrwxrwx 1 cognifloyd cognifloyd  9 Apr 16 00:08 python3 -> python3.7
lrwxrwxrwx 1 cognifloyd cognifloyd 18 Apr 16 00:08 python3.7 -> /usr/bin/python3.7

The critical entry is python3.7 which is a symlink to /usr/bin/python3.7 which is a symlink to /usr/bin/python3.7m. Pants then runs with sys.executable of /home/cognifloyd/.cache/pants/pants_dev_deps/bin/python3.7. Since that sys.executable is hosted in a venv, this resolve code does not realpath and escape the venv (quite on purpose): https://github.com/pantsbuild/pex/blob/dbd4c138f318619c00f5f1ffa4bb30bb3c585b80/pex/interpreter.py#L1059 https://github.com/pantsbuild/pex/blob/dbd4c138f318619c00f5f1ffa4bb30bb3c585b80/pex/interpreter.py#L584-L593

This leads to pex.interpreter.binary having a basename of python3.7 - the pants_dev_deps canonical Python symlink out of the venv to the system interpreter symlink.

When that interpreter is used to build a venv via /home/cognifloyd/.cache/pants/pants_dev_deps/bin/python3.7 -mvenv ... you get a bin structure of:

python -> python3.7m
python3 -> python3.7m
python3.7m -> /usr/bin/python3.7m

IOW the sys.executable symlink is fully resolved by the venv module. Now we have the mismatch.

My machine also has a python3.7m (three of them, one system 3.7.15, one pyenv 3.7.14 and one pyenv 3.7.15), but none of them has your symlink structure. They each have both python3.7 and python3.7m as copies of the same binary. To repro all this i moved aside the pyenv 3.7.15 python3.7 binary and made it be a symlink to the python3.7m sibling binary. I was then able to repro, and then test a fix.

I’ll probably end up giving up on an IT for this though.

Aha, the --python / --python-path contradiction continues to be worrying, but that’s not the issue here. I think I can connect the dots here and they do, in fact, land at s/pex.interpreter.binary/virtualenv.interpreter.binary/ and the link requires --python is the sys.executable of a venv, which is the case here with the origination being a pants_dev_deps venv sys.executable.

Ok, the OP error path runs through here: https://github.com/pantsbuild/pants/blob/7398971f83db4877e34dd48f002920e4906997fb/src/python/pants/init/plugin_resolver.py#L67-L86

I verified the production code path does not set interpreter constraints for the PluginRequest; so the python derived from sys.executable is used. This explains the python3.7 from the Pants bootstrap venv being used to run the venv PEX as opposed to the python3.7m on your PATH. That also explains why this has ~always been a problem - this plugin code path is a bit be-spoke compared to other venv PEX uses in Pants that have had more recent changes.

This still leaves me figuring out a test / fix for python3.7 venv.pex leading to a a bad venv PEX shebang, but it does fully explain the mechanism behind the Pants bootstrap venv python3.7 getting used here.