gcsfs: isdir/info method works incorrectly

Hello, I’ve found a strange behavior of the isdir method (digging deeper also with info method). It returns incorrect values. These values seem to be returned randomly.

I use Python 3.10.12 and I’ve tested this behavior on gscfs=2022.3.0, and the latest version gscfs=2023.6.0

I’ve prepared a helper function to show what is happening here:

from gcsfs import GCSFileSystem

fs = GCSFileSystem()


def check_is_dir(path):
    is_dir = fs.isdir(path)
    info_type = fs.info(path)["type"]

    print(path, is_dir, info_type)

Problem example

An exemplary run:

check_is_dir('gs://my/super')
check_is_dir('gs://my/super/secret')
check_is_dir('gs://my/super/secret/gcs')
check_is_dir('gs://my/super/secret/gcs/directory')
check_is_dir('gs://my/super/secret/gcs/directory/file.json')

Results:

gs://my/super False director  # the first string is a path, the first boolean is a value returned by isdir method, and the second string is 'type' value in the dictionary returned by fs.info(path)
gs://my/super/secret True directory
gs://my/super/secret/gcs False directory
gs://my/super/secret/gcs/directory True directory
gs://my/super/secret/gcs/directory/file.json False file

As you can see, some directories are incorrectly treated as files. So more, values returned by the info and isdir methods are inconsistent.

Another insight

Changing the order of calling these methods, like in the snippet below:

def check_is_dir(path):
    info_type = fs.info(path)["type"]
    is_dir = fs.isdir(path)

    print(path, is_dir, info_type)

makes is_dir contains a correct value, but info_type incorrect one. Like here:

gs://my/super True file  # the first string is a path, the first boolean is a value returned by isdir method, and the second string is 'type' value in the dictionary returned by fs.info(path)
gs://my/super/secret True directory
gs://my/super/secret/gcs True file
gs://my/super/secret/gcs/directory True directory
gs://my/super/secret/gcs/directory/file.json False file

Workaround

For now, my workaround is to run isdir method two times:

def check_is_dir(path):
    is_dir = fs.isdir(path)
    is_dir = fs.isdir(path)
    info_type = fs.info(path)["type"]

    print(path, is_dir, info_type)

It works:

gs://my/super True directory  # the first string is a path, the first boolean is a value returned by isdir method, and the second string is 'type' value in the dictionary returned by fs.info(path)
gs://my/super/secret True directory
gs://my/super/secret/gcs True directory
gs://my/super/secret/gcs/directory True directory
gs://my/super/secret/gcs/directory/file.json False file

But I want to work with this library without such workaround 😉

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 20 (10 by maintainers)

Most upvoted comments

It is the path I asked for, but curiously pathlib (and also os.listdir()) have an interesting way of showing this. Speaking about a PR - I started https://github.com/fsspec/gcsfs/pull/598.

But it doesn’t really match the behavior of LocalFileSystem (and also it does not match the documentation of fsspec).

>>> from fsspec.implementations.local import LocalFileSystem
>>> lfs = LocalFileSystem()
>>> lfs.ls('/tmp/fsspec-test/empty_folder')
[]
>>> lfs.ls('/tmp/fsspec-test/folder')
['/tmp/fsspec-test/folder/test.txt]']
>>> lfs.ls('/tmp/fsspec-test/folder/test.txt')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/venv/lib/python3.10/site-packages/fsspec/implementations/local.py", line 66, in ls
    return [posixpath.join(path, f) for f in os.listdir(path)]
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/fsspec-test/folder/test.txt'