vaex: [BUG-REPORT] CAN'T READ PARQUET FROM AMAZON S3 ON AN EC2 INSTANCE
Description
I can’t load data from s3, by doing this
import vaex
vaex.open("s3://myfile.parquet")
I get the following error
error opening 's3://data-lake.e [__init__.py](file:///home/ubuntu/.pyenv/versions/3.7.5/lib/python3.7/site-packages/vaex/__init__.py):[259](file:///home/ubuntu/.pyenv/versions/3.7.5/lib/python3.7/site-packages/vaex/__init__.py#259)
u-central-1/v1/reporting_tables/reporting_tables
/trackingevents/'
Traceback (most recent call last):
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/__init__.py", line
232, in open
ds = vaex.dataset.open(path,
fs_options=fs_options, fs=fs, **kwargs)
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/dataset.py", line
73, in open
return opener.open(path,
fs_options=fs_options, fs=fs, *args, **kwargs)
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/arrow/opener.py",
line 44, in open
return open_parquet(path, *args, **kwargs)
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/arrow/dataset.py",
line 345, in open_parquet
return DatasetParquet(path,
fs_options=fs_options, fs=fs,
partitioning=partitioning, kwargs=kwargs)
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/arrow/dataset.py",
line 197, in __init__
super().__init__(max_rows_read=max_rows_read
)
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/arrow/dataset.py",
line 26, in __init__
self._create_columns()
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/arrow/dataset.py",
line 227, in _create_columns
super()._create_columns()
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/arrow/dataset.py",
line 29, in _create_columns
self._create_dataset()
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/arrow/dataset.py",
line 232, in _create_dataset
self._arrow_ds =
pyarrow.dataset.dataset(source,
filesystem=file_system,
partitioning=self.partitioning)
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/pyarrow/dataset.py", line
667, in dataset
return _filesystem_dataset(source, **kwargs)
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/pyarrow/dataset.py", line
420, in _filesystem_dataset
factory = FileSystemDatasetFactory(fs,
paths_or_selector, format, options)
File "pyarrow/_dataset.pyx", line 1854, in pya
rrow._dataset.FileSystemDatasetFactory.__init__
File "pyarrow/error.pxi", line 143, in
pyarrow.lib.pyarrow_internal_check_status
File "pyarrow/_fs.pyx", line 1137, in
pyarrow._fs._cb_get_file_info_selector
File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p
ython3.7/site-packages/vaex/file/cache.py", line
97, in get_file_info_selector
return self.fs.get_file_info_selector(*args,
**kwargs)
AttributeError: 'pyarrow._s3fs.S3FileSystem'
object has no attribute 'get_file_info_selector'
Software information
- Vaex version: {‘vaex’: ‘4.8.0’, ‘vaex-core’: ‘4.8.0’, ‘vaex-viz’: ‘0.5.1’, ‘vaex-hdf5’: ‘0.12.0’, ‘vaex-server’: ‘0.8.1’, ‘vaex-astro’: ‘0.9.0’, ‘vaex-jupyter’: ‘0.7.0’, ‘vaex-ml’: ‘0.17.0’}
- Vaex was installed via: pip
- OS: Ubuntu
Additional information I’m running on an EC2 instance so all the credentials for opening in s3 are already implemented
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 22 (10 by maintainers)
I have the same error. Created an envrionment with just vaex installed and have the same error.