pytest: TestLocalPath.test_make_numbered_dir_multiprocess_safe sometimes fails with py.error.EEXIST: [File exists]

Hello. I’ve noticed recently that the Fedora build system nudges me that pytest fails to build every now and then. This has been happening with 7.4.2 and I’ve managed to reproduce with 7.4.3 as well.

The failure always looks like this:

+ PYTEST_XDIST_AUTO_NUM_WORKERS=5
+ /builddir/build/BUILDROOT/pytest-7.4.3-1.fc40.i386/usr/bin/pytest testing --timeout=30 -n auto -rs
============================= test session starts ==============================
platform linux -- Python 3.12.0, pytest-7.4.3, pluggy-1.3.0
rootdir: /builddir/build/BUILD/pytest-7.4.3
configfile: pyproject.toml
plugins: hypothesis-6.82.0, timeout-2.2.0, xdist-3.3.1
timeout: 30.0s
timeout method: signal
timeout func_only: False
created: 5/5 workers
5 workers [3468 items]
...
=================================== FAILURES ===================================
____________ TestLocalPath.test_make_numbered_dir_multiprocess_safe ____________
[gw4] linux -- Python 3.12.0 /usr/bin/python3
multiprocessing.pool.RemoteTraceback: 
"""
Traceback (most recent call last):
  File "/builddir/build/BUILDROOT/pytest-7.4.3-1.fc40.i386/usr/lib/python3.12/site-packages/_pytest/_py/error.py", line 85, in checked_call
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
FileExistsError: [Errno 17] File exists: '/tmp/pytest-of-mockbuild/pytest-0/popen-gw4/test_make_numbered_dir_multipr0/repro-1229'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/pool.py", line 125, in worker
    result = (True, func(*args, **kwds))
                    ^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILD/pytest-7.4.3/testing/_py/test_local.py", line 550, in batch_make_numbered_dirs
    dir_ = local.make_numbered_dir(prefix="repro-", rootdir=rootdir)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILDROOT/pytest-7.4.3-1.fc40.i386/usr/lib/python3.12/site-packages/_pytest/_py/path.py", line 1342, in make_numbered_dir
    udir = rootdir.mkdir(prefix + str(maxnum + 1))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/builddir/build/BUILDROOT/pytest-7.4.3-1.fc40.i386/usr/lib/python3.12/site-packages/_pytest/_py/path.py", line 887, in mkdir
    error.checked_call(os.mkdir, os.fspath(p))
  File "/builddir/build/BUILDROOT/pytest-7.4.3-1.fc40.i386/usr/lib/python3.12/site-packages/_pytest/_py/error.py", line 101, in checked_call
    raise cls(f"{func.__name__}{args!r}")
py.error.EEXIST: [File exists]: mkdir('/tmp/pytest-of-mockbuild/pytest-0/popen-gw4/test_make_numbered_dir_multipr0/repro-1229',)
"""
The above exception was the direct cause of the following exception:
self = <test_local.TestLocalPath object at 0xf50227c8>
tmpdir = local('/tmp/pytest-of-mockbuild/pytest-0/popen-gw4/test_make_numbered_dir_multipr0')
    def test_make_numbered_dir_multiprocess_safe(self, tmpdir):
        # https://github.com/pytest-dev/py/issues/30
        with multiprocessing.Pool() as pool:
            results = [
                pool.apply_async(batch_make_numbered_dirs, [tmpdir, 100])
                for _ in range(20)
            ]
            for r in results:
>               assert r.get()
/builddir/build/BUILD/pytest-7.4.3/testing/_py/test_local.py:879: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
self = <multiprocessing.pool.ApplyResult object at 0xf4c5fe40>, timeout = None
    def get(self, timeout=None):
        self.wait(timeout)
        if not self.ready():
            raise TimeoutError
        if self._success:
            return self._value
        else:
>           raise self._value
E           py.error.EEXIST: [File exists]: mkdir('/tmp/pytest-of-mockbuild/pytest-0/popen-gw4/test_make_numbered_dir_multipr0/repro-1229',)
/usr/lib/python3.12/multiprocessing/pool.py:774: EEXIST
...
= 1 failed, 3414 passed, 35 skipped, 12 xfailed, 6 xpassed in 65.16s (0:01:05) =

However, the number in repro-868 changes.

When I submit 10 builds in the Fedora builders, usually at least one or two of them fail this way.

Example logs:

The first occurrence I can recall (with build logs already garbage collected) was on 2023-09-29.

The above log is from i686 builder, but I’ve also seen it on x86_64 or ppc64le.

I assumed the test is flaky, but considering this partiular test tests this does not happen, I am afraid something is broken.

However, I’ve been unable to reproduce this yet outside of our build environment.

I think there is a race condition somewhere, but I am out of my depth currently.

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 27 (27 by maintainers)

Commits related to this issue

Most upvoted comments

@asottile @nicoddemus any opinion on dropping that bit of our shim for the 8.x release?

@hroncok unfortunately on the 7.x series we can only mark the test as non-strict xfail