gcsfs: FileNotFoundError on file.read

I have uploaded a file to GCS and was able to read it without problem. After that I deleted the file and uploaded it again with a same name. When I called file.read function again, I receive FileNotFoundError coming from fetch range though the file exists. Full traceback:

    byte_str = csv_file.read(4096)
  File "/opt/conda/default/lib/python3.6/site-packages/fsspec/spec.py", line 1040, in read
    out = self.cache._fetch(self.loc, self.loc + length)
  File "/opt/conda/default/lib/python3.6/site-packages/fsspec/core.py", line 464, in _fetch
    self.cache = self.fetcher(start, end + self.blocksize)
  File "</opt/conda/default/lib/python3.6/site-packages/decorator.py:decorator-gen-22>", line 2, in _fetch_range
  File "/opt/conda/default/lib/python3.6/site-packages/gcsfs/core.py", line 54, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/default/lib/python3.6/site-packages/gcsfs/core.py", line 1067, in _fetch_range
    headers=head)
  File "</opt/conda/default/lib/python3.6/site-packages/decorator.py:decorator-gen-2>", line 2, in _call
  File "/opt/conda/default/lib/python3.6/site-packages/gcsfs/core.py", line 54, in _tracemethod
    return f(self, *args, **kwargs)
  File "/opt/conda/default/lib/python3.6/site-packages/gcsfs/core.py", line 462, in _call
    validate_response(r, path)
  File "/opt/conda/default/lib/python3.6/site-packages/gcsfs/core.py", line 157, in validate_response
    raise FileNotFoundError
FileNotFoundError

I have invalided the cache and cleared instance cache before reading the file but they didn’t help.

Versions: gcsfs==0.3.0 dask==2.1.0

About this issue

  • Original URL
  • State: open
  • Created 5 years ago
  • Reactions: 4
  • Comments: 17 (9 by maintainers)

Most upvoted comments

I can confirm that it’s a cache issue like you suggested above:

import gcsfs
import pandas as pd
from google.cloud import storage

bucket_name = "testing"
blob_name = "test.csv"
client = storage.Client()
bucket = client.get_bucket(bucket_name)

data = "some data"
bucket.blob(blob_name).upload_from_string(data)

# Test #1 Succeeds
df = pd.read_csv(f"gs://{bucket_name}/{blob_name}")

client.get_bucket(bucket_name).delete_blob(blob_name)
bucket.blob(blob_name).upload_from_string(data)

# Test #2 Fails
df = pd.read_csv(f"gs://{bucket_name}/{blob_name}")

# Test #3 Fails
fs = gcsfs.GCSFileSystem()
with fs.open(f"{bucket_name}/{blob_name}", 'rb') as f:
    print(f.read())

# Test #4 Succeeds
fs.invalidate_cache()
with fs.open(f"{bucket_name}/{blob_name}", 'rb') as f:
    print(f.read())