poetry: poetry add/update/install can fail on Windows during temp directory cleanup

If something goes wrong during an installation, a file is often held open for just a little too long for TemporaryDirectory causing it to error out with errors such as:

[OSError]
[WinError 145] The directory is not empty: 'C:\\Users\\KING~1.KYL\\AppData\\Local\\Temp\\tmp774m6ge1'

...

 C:\Users\king.kyle\AppData\Local\Continuum\anaconda2\envs\py37\lib\contextlib.py in __exit__() at line 119
   next(self.gen)
 C:\Users\king.kyle\.poetry\lib\poetry\utils\helpers.py in temporary_directory() at line 35
   yield name
 C:\Users\king.kyle\AppData\Local\Continuum\anaconda2\envs\py37\lib\tempfile.py in __exit__() at line 805
   self.cleanup()
 C:\Users\king.kyle\AppData\Local\Continuum\anaconda2\envs\py37\lib\tempfile.py in cleanup() at line 809
   _shutil.rmtree(self.name)
 C:\Users\king.kyle\AppData\Local\Continuum\anaconda2\envs\py37\lib\shutil.py in rmtree() at line 507
   return _rmtree_unsafe(path, onerror)
 C:\Users\king.kyle\AppData\Local\Continuum\anaconda2\envs\py37\lib\shutil.py in _rmtree_unsafe() at line 395
   onerror(os.rmdir, path, sys.exc_info())
 C:\Users\king.kyle\AppData\Local\Continuum\anaconda2\envs\py37\lib\shutil.py in _rmtree_unsafe() at line 393
   os.rmdir(path)

add [-D|--dev] [--git GIT] [--path PATH] [-E|--extras EXTRAS] [--optional] [--python PYTHON] [--platform PLATFORM] [--allow-prereleases] [--dry-run] [--] <name> (<name>)...

This is an incredibly unhelpful error and masks all sorts of other problems.

It is causes by a bug in shutil.rmtree which can fail on Windows from time to time. It can fail for a variety of reasons, but very commonly it’s simply a process holding the file active for just a little too long resulting in os.rmdir failing. This is despite the directory actually being empty when you check (i.e. it’s just slow).

This is a known error in Python, of which there is no plan to fix: https://bugs.python.org/issue29982

As a result, I suggest the avoidance of using TemporaryDirectory, over which we have no control of the cleanup, and instead use mkdtemp manually, followed by a short loop to try and remove the directory.

In particular in helpers.py:

@contextmanager
def temporary_directory(*args, **kwargs):
    try:
        from tempfile import TemporaryDirectory

        with TemporaryDirectory(*args, **kwargs) as name:
            yield name
    except ImportError:
        name = tempfile.mkdtemp(*args, **kwargs)

        yield name

        shutil.rmtree(name)

should instead be something like:

@contextmanager
def temporary_directory(*args, **kwargs):
    name = tempfile.mkdtemp(*args, **kwargs)

    yield name

    robust_rmtree(name)

def robust_rmtree(path, max_retries=6):
    """Robustly tries to delete paths.

    Retries several times if an OSError occurs.
    If the final attempt fails, the Exception is propagated
    to the caller.
    """
    for i in range(max_retries - 1):
        try:
            shutil.rmtree(path)
            return # Only hits this on success
        except OSError:
            time.sleep(1)

    # Final attempt, pass any Exceptions up to caller.
    shutil.rmtree(path)

This is done in a few other projects:

The latter is where I ripped the example function.

I’m willing to write a pull request later to test this myself. But I’m writing the issue now as it is a problem that should be dealt with.

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 2
  • Comments: 23

Commits related to this issue

Most upvoted comments

The PR https://github.com/python-poetry/poetry-core/pull/460 recreates the patch that should address the issue.

It looks to be a fairly widespread problem with poetry on windows and fails about 10% of our CI builds.

Hey … I can understand that you are frustraded… but there is a gold rule in all projects in the entire world … be kind!

Similar code is also included in cpython itself! In particular, while they have agreed to not include this into base Python, they do still need it for their own Windows tests.

For reference: Associated bug ticket: https://bugs.python.org/issue15496 - commit referencing it: https://github.com/python/cpython/commit/6f5c5cb75b2fddf31cbb311d2a5508c54f6e21f6 (via Blame feature on GitHub)

There is no guarantee it will need only two runs! Python themselves run the check up to 10 times.

This is what CPython does:

  • Wait 1ms
  • Wait 2ms
  • Wait 4ms
  • Wait 8ms
  • Wait 16ms
  • Wait 32ms
  • Wait 64ms
  • Wait 128ms
  • Wait 256ms
  • Wait 512ms
  • Report failure

This results in a maximum wait of 1 second, but a minimum wait of 1ms. The idea being that you will avoid the Windows issue in the least amount of time needed.

I suggest we implement the system that Python themselves use. There is probably a good reason they choose to do it that way.

Checking only twice means we have to catch the worst case, which means waiting for 1 second every time. While rare that’s just a waste of time for no good reason.


Again. Why can’t this be called robust_rmtree? Or something similar that explains what it does. The name should explain the functionality (it’s a more robust rmtree), not the exact implementation.

Do you agree it’s good to use the same system as CPython? Start at 1ms wait, and exponentially scale to 1 second wait (this means the total possible wait is < 2 seconds). If Python themselves do it, then it must be reasonable.

I don’t understand the issue with the name roboust_rmtree() it’s a much clearer name than doubel_rmtree() (which kind of sounds like you’re removing something exactly twice) and I don’t understand why we care about reserving the leading r. Especially given that rmtree already begins with an r and that I have no idea what rampage_rmtree() is…

robust_rmtree() in its very nature can never guarantee removal of a directory. We have to have a timeout put in there somewhere. But it is the same system as used by:

And has been in both of those codebases for over 5 years.

As a “recent” example, rmtree() fails if a subfolder is opened in Explorer on windows due to a 0.8 millisecond wait on the directory. So in the vast majority of cases I expect this to clean up within a second.

Similar code is also included in cpython itself! In particular, while they have agreed to not include this into base Python, they do still need it for their own Windows tests.

Note: They wait for less than 1 second though, starting at 0.001 seconds and increasing exponentially up to a maximum of 1. So maybe the initial wait of 2 seconds it needlessly large, however given that this is running during poetry install and rarely actually occurs (generally at maximum on a single dependency), I think the time is OK.


With regards to testing, we used to have a single PC that consistently failed due to this error, and this change means that the system now installs requirements properly. We now have been using this fix for some time and have yet to run into any issues.

The problem is more of an issue with Windows buildbots though, where you can’t just “retry” the install in the same way you would on a PC.

I don’t think you recognize the problem here. When you mark a file for deletion in Windows, it does not delete immediately. It is held in stasis until all references to the file clear, and then at some arbitrary point in the future it is removed.

rmtree does not wait for this to occur. Which means, from time to time, it will just arbitrarily fail. This can be simply from timing (nested directories almost always cause a problem), or from antivirus taking 0.3 seconds too long to scan the file on access.

As a result, it will error when it tries to delete the parent directory while the files within it are waiting on removal. These will be removed very quickly, within seconds, generally less than 1 second, but this errors regardless.

This is a well known problem with Windows. Many other projects have faced this issue and implement a delayed deletion to deal with it. There is a reason that Poetry gives Directory not empty errors in Windows when completely unrelated problems occur, and it’s because of this delayed deletion within Windows.


There is no sensible way for me to use Unlocker to determine what is holding it open for the 100ms it’s grabbed for. It could simply be the Windows file system!

This is a Windows issue. It is unrelated to my PC in particular, but the windows ecosystem on a whole. The speed of your system can be enough to make something succeed or fail. The speed of your HDD makes a difference. There is no sensible way to force or test for this, you just must know that the problem exists and deal with it.