cloudpathlib: Ceph compatibility - is_dir() fails for directories when "/" included at end of path

Hello,

For directories that have a suffix such as .SAFE directories for Sentinel satellites data, pathlib recognizes the directories correctly, although cloudpathlib fails:

>>> # On disk
>>> from cloudpathlib import AnyPath
>>> A = AnyPath(r"D:\S2B_MSIL2A_20200114T065229_N0213_R020_T40REQ_20200114T094749.SAFE")
>>> A.is_file()
False
>>> A.is_dir()
True
>>> # In th Cloud
>>> client = S3Client(
    endpoint_url=f"https://{AWS_S3_ENDPOINT}",
    aws_access_key_id=os.getenv(AWS_ACCESS_KEY_ID),
    aws_secret_access_key=os.getenv(AWS_SECRET_ACCESS_KEY),
)
>>> client.set_as_default_client()
>>> B = AnyPath("s3://my-bucket/S2B_MSIL2A_20200114T065229_N0213_R020_T40REQ_20200114T094749.SAFE/")
>>> B.is_file()
True
>>> B.is_dir()
False

However, it works when removing the final \ !

>>> # In th Cloud
>>> client = S3Client(
    endpoint_url=f"https://{AWS_S3_ENDPOINT}",
    aws_access_key_id=os.getenv(AWS_ACCESS_KEY_ID),
    aws_secret_access_key=os.getenv(AWS_SECRET_ACCESS_KEY),
)
>>> client.set_as_default_client()
>>> B = AnyPath("s3://my-bucket/S2B_MSIL2A_20200114T065229_N0213_R020_T40REQ_20200114T094749.SAFE")
>>> B.is_file()
False
>>> B.is_dir()
True

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 16 (9 by maintainers)

Most upvoted comments

@remi-braun @GeorgeSabu ^ filed #198 to track the recursion issue and the fix you pointed out. If you can contribute referencing that issue, it would be great. Thanks!

I think I have the same problem in another place:

I am downloading a multi level directory, such as: 2021-06-30_17h55_08 (the sub-directories are not empty), with the code:

path.download_to(dir.joinpath(path.name))

And the downloaded product obtained is: 2021-06-30_17h53_19

You can see that directories have been downloaded as files !

The code of the function does 2 different things according if we are in face of a file or a directory (and it adds a / if it is a directory !), so maybe it is related:

def download_to(self, destination: Union[str, os.PathLike]) -> Path:
    destination = Path(destination)
    if self.is_file():
        if destination.is_dir():
            destination = destination / self.name
        return self.client._download_file(self, destination)
    else:
        destination.mkdir(exist_ok=True)
        for f in self.iterdir():
            rel = str(self)
            if not rel.endswith("/"):
                rel = rel + "/"

            rel_dest = str(f)[len(rel) :]
            f.download_to(destination / rel_dest)

        return destination

All this is happening on a S3 compatible storage, I don’t know if it is related.