adlfs: Recursive remove do not work with newest versions
Hi, my old conda environment:
adlfs 2022.11.2 pyhd8ed1ab_0 conda-forge
azure-core 1.26.1 pyhd8ed1ab_0 conda-forge
azure-datalake-store 0.0.51 pyh9f0ad1d_0 conda-forge
azure-identity 1.12.0 pyhd8ed1ab_0 conda-forge
azure-storage-blob 12.14.1 pyhd8ed1ab_0 conda-forge
fsspec 2022.10.0 py310haa95532_0
runs the following without any error
import adlfs
import azure.identity.aio
AZURE_FS = adlfs.AzureBlobFileSystem(
account_name=STORAGE_ACCOUNT,
credential=azure.identity.aio.DefaultAzureCredential())
AZURE_FS.rm(PATH, recursive=True)
But after updating to these package versions:
adlfs 2023.1.0 pyhd8ed1ab_0 conda-forge
azure-core 1.26.2 pyhd8ed1ab_0 conda-forge
azure-datalake-store 0.0.51 pyh9f0ad1d_0 conda-forge
azure-identity 1.12.0 pyhd8ed1ab_0 conda-forge
azure-storage-blob 12.13.1 pyhd8ed1ab_0 conda-forge
fsspec 2022.11.0 py310haa95532_0
it raises this error:
RuntimeError: ('Failed to remove %s for %s', [PATHS], ResourceExistsError('This operation is not permitted on a non-empty directory.\nRequestId:ID\nTime:2023-01-20T09:33:26.1693752Z\nErrorCode:DirectoryIsNotEmpty'))
Bacially, the recursive=True do not work as intended.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 4
- Comments: 25 (17 by maintainers)
How would we feel about just reverting https://github.com/fsspec/adlfs/pull/383? I know it provides a big performance advantage for flat namespace accounts but it also breaks hierarchical namespace (ADLS gen2) accounts.
I think something like what @nosterlu and I described could work but there are probably edge cases to consider.
Thanks for the details @nosterlu ! I am going to try to reproduce it
Nice! I played around a little also when trying to understand how your code worked!
This could maybe make it a little bit clearer?
for example for testing
if all files are directories like this
files = [ “ptp_parquets”, “orders”, ]
they will all end up as
blobs… but I guess it still will work, since there are no dangling files within them… but maybe there is a better way to identify files vs folders in an azure storage… 😅Probably not the neatest solution but I think it works. https://github.com/Tom-Newton/adlfs/pull/1
pip install https://github.com/Tom-Newton/adlfs/archive/aec77b00c1fa7fb5bfbbec88e1c9fac45f133e97.zipI think ideally we would change the way it does listing so that
filescontains file details not just path strings. That would provide a better option for distinguishing files from directories.I think it should be quite straightforward to fix. I’ll give it a try
I think this problem might only effect storage accounts with hierarchical namespace enabled (ADLS gen2). I can reproduce it with basically any recursive delete on a hierarchical namespace account but not on a flat namespace account.
Is everyone else having issues, also using hierarchical namespace accounts?
This would also explain why it can be reproduced in any integration test because azurite does not support hierarchical namespace. The Azure error is also says “non-empty directory” but flat namespace accounts don’t have directories so it would be strange to receive that error on a flat namespace account.
I have the same problem with latest adlfs and fsspec. In the Azure container it is maybe 4 folders deep, with parquet files in the last folder. But it does not always throw an error, it depends on the folder structure.
For example, first time I run this code it only succeeds to delete 2 out of 3 folders with similar depths
Next time I run this code (to try and delete the remaining folder I get this error.
This is my current setup with pip list. (I have only installed via pip)
Hello. I am having the exact same issue. What is the proper mitigation procedure please?
@daavoo - I have tried both direct paths like “container/blob” and protocal paths link “az://container/blob” - but both fail the same way.
CC @daavoo in case it is related to https://github.com/fsspec/adlfs/pull/383