s3fs: EOFError with Gzip read
I created an issue on the Pandas repo, but it looks like it might be an s3fs, versions 0.3.0 and greater, error.
import pandas as pd
data = pd.read_csv("s3://bucketname/file.csv.gz")
Gives the following error:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/site-packages/pandas/io/parsers.py", line 685, in parser_f
return _read(filepath_or_buffer, kwds)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/site-packages/pandas/io/parsers.py", line 463, in _read
data = parser.read(nrows)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/site-packages/pandas/io/parsers.py", line 1154, in read
ret = self._engine.read(nrows)
File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/site-packages/pandas/io/parsers.py", line 2059, in read
data = self._reader.read(nrows)
File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read
File "pandas/_libs/parsers.pyx", line 896, in pandas._libs.parsers.TextReader._read_low_memory
File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows
File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows
File "pandas/_libs/parsers.pyx", line 2124, in pandas._libs.parsers.raise_parser_error
File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/home/ubuntu/miniconda3/envs/test/lib/python3.7/gzip.py", line 482, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
Setup
Installed Pandas and s3fs via pip:
pip install pandas s3fs
Pandas version: 0.25.1 s3fs version: 0.3.3
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 1
- Comments: 16 (6 by maintainers)
Commits related to this issue
- BUG: Fixed file fetching issue Closes https://github.com/dask/s3fs/issues/225 Which apparently was broken by 4749ab9c5f786a6ce9cdc3a098c12f205e8cf207. Still working on a test. — committed to TomAugspurger/filesystem_spec by TomAugspurger 5 years ago
- Skip negative range requests Moto (httppretty) and S3 differ in behavior when the range request is like `range=10-9`. S3 (correctly) ignores the range field, while moto raises. With this change, we ... — committed to TomAugspurger/s3fs by TomAugspurger 5 years ago
I’m able to reproduce, but only on a real request against s3, not with moto. I’ll look into this today though, and should be able to do a release of fsspec once it’s fixed.
0.3.4 is on PyPI. Will show up on Conda-forge later today.
This should be fixed with #229. If anyone wants to test that out quickly, I’d appreciate it, since we can’t actually write a unit test for this (that doesn’t hit S3). But with my local testing, things are OK.
Otherwise, I’ll do a release later this afternoon.
After another quick check, I can confirm that downgrading by using:
pip install fsspec==0.4.1takes care of the problem. Using
0.4.2fails with: