pex: pex build fails due to existing work-directory

Beginning with version 2.1.105 building the pex file in our CI pipeline fails with the following message:

…/python3.8/site-packages/pex/atomic_directory.py:176: PEXWarning: [pid:XX, tid:XXX, cwd:…]: After obtaining an exclusive lock on <PEX_ROOT>/isolated/.2f4fc85fa2be055a2975ce1147100c0d5c7e663a.atomic_directory.lck, failed to establish a work directory at <PEX_ROOT>/isolated/2f4fc85fa2be055a2975ce1147100c0d5c7e663a.workdir due to: [Errno 17] File exists: ‘<PEX_ROOT>/isolated/2f4fc85fa2be055a2975ce1147100c0d5c7e663a.workdir’ pex_warnings.warn( …/python3.8/site-packages/pex/atomic_directory.py:187: PEXWarning: [pid:XX, tid:XXX, cwd:…]: Continuing to forcibly re-create the work directory at <PEX_ROOT>/isolated/2f4fc85fa2be055a2975ce1147100c0d5c7e663a.workdir. pex_warnings.warn( Failed to spawn a job for …/bin/python: [Errno 17] File exists: ‘<PEX_ROOT>/isolated/2f4fc85fa2be055a2975ce1147100c0d5c7e663a.workdir/pex/./venv’

It seems to have to do with #1905 introduced in version 2.1.105, but we have no clue, why this is happening in our CI pipeline, while building the .pex file on MacOS developer machines works. It looks like something else is creating that directory, but there is only one pex command in the pipeline job and the PEX_ROOT is not cached.

Our build environment uses:

  • the Red Hat UBI 8.4 Docker image
  • Python 3.8
  • poetry 1.1, which manages pex as a dev-dependency

Then we build the pex with poetry run pex --inherit-path --python=python3.8 --requirement requirements.txt --find-links dist/ our_module --output-file dist/final.pex

Any idea why this is happening or what else we could check would be helpful.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 26 (21 by maintainers)

Commits related to this issue

Most upvoted comments

@james-johnston-thumbtack thank you so much for the repro case. As is always the case, these are absolute gold and make debugging roughly infinitely easier and quicker than it is otherwise.

I’ll be damned, this fixes:

$ git diff
diff --git a/pex/jobs.py b/pex/jobs.py
index 734836a..db05fc0 100644
--- a/pex/jobs.py
+++ b/pex/jobs.py
@@ -550,3 +550,4 @@ def execute_parallel(
                         error = e
         finally:
             job_slots.release()
+            spawner.join()

I really don’t know how I continually glossed over / missed the jobs.py Thread spawn.

I need to think this through a bit more, but I think this solves the issue. The bsd locks are still needed for the lock file handling of parallel downloads (and the later added parallel downloads of PEP-691 metadata), but the old-school plain old Pex code paths are made safe with the lone thread join ensuring its shut down before serially continuing to the next lines of code.

@jsirois Thanks for the quick response. Setting _PEX_FILE_LOCK_STYLE=bsd solved the problem. Would you suggest to set it as a workaround until there is a fix for the locking?

@christopherfrieler that warning message looks like the one in the 2.1.112 release. I added it in #1961 to help debug a probable race or wrong POSIX assumption that has been hard to track down. I know this is painful for you, but I’m very happy to have a repro case from you! Can you try setting _PEX_FILE_LOCK_STYLE=bsd (added in #1962) in your CI environment and see if that changes anything?