s3fs: EOFError with Gzip read

I created an issue on the Pandas repo, but it looks like it might be an s3fs, versions 0.3.0 and greater, error.

import pandas as pd
data = pd.read_csv("s3://bucketname/file.csv.gz")

Gives the following error:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
    return _read(filepath_or_buffer, kwds)
  File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/site-packages/pandas/io/parsers.py", line 463, in _read
    data = parser.read(nrows)
  File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/site-packages/pandas/io/parsers.py", line 1154, in read
    ret = self._engine.read(nrows)
  File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/site-packages/pandas/io/parsers.py", line 2059, in read
    data = self._reader.read(nrows)
  File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read
  File "pandas/_libs/parsers.pyx", line 896, in pandas._libs.parsers.TextReader._read_low_memory
  File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows
  File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows
  File "pandas/_libs/parsers.pyx", line 2124, in pandas._libs.parsers.raise_parser_error
  File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/gzip.py", line 482, in read
    raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached

Setup

Installed Pandas and s3fs via pip:

pip install pandas s3fs

Pandas version: 0.25.1 s3fs version: 0.3.3

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Reactions: 1
  • Comments: 16 (6 by maintainers)

Commits related to this issue

Most upvoted comments

I’m able to reproduce, but only on a real request against s3, not with moto. I’ll look into this today though, and should be able to do a release of fsspec once it’s fixed.

0.3.4 is on PyPI. Will show up on Conda-forge later today.

This should be fixed with #229. If anyone wants to test that out quickly, I’d appreciate it, since we can’t actually write a unit test for this (that doesn’t hit S3). But with my local testing, things are OK.

Otherwise, I’ll do a release later this afternoon.

After another quick check, I can confirm that downgrading by using:

pip install fsspec==0.4.1

takes care of the problem. Using 0.4.2 fails with:

error: Error -3 while decompressing data: invalid distance too far back