vaex: [BUG-REPORT] CAN'T READ PARQUET FROM AMAZON S3 ON AN EC2 INSTANCE

Description I can’t load data from s3, by doing this import vaex vaex.open("s3://myfile.parquet")

I get the following error

error opening 's3://data-lake.e [__init__.py](file:///home/ubuntu/.pyenv/versions/3.7.5/lib/python3.7/site-packages/vaex/__init__.py):[259](file:///home/ubuntu/.pyenv/versions/3.7.5/lib/python3.7/site-packages/vaex/__init__.py#259)
                             u-central-1/v1/reporting_tables/reporting_tables                
                             /trackingevents/'                                               
                             Traceback (most recent call last):                              
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/__init__.py", line                  
                             232, in open                                                    
                                 ds = vaex.dataset.open(path,                                
                             fs_options=fs_options, fs=fs, **kwargs)                         
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/dataset.py", line                   
                             73, in open                                                     
                                 return opener.open(path,                                    
                             fs_options=fs_options, fs=fs, *args, **kwargs)                  
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/arrow/opener.py",                   
                             line 44, in open                                                
                                 return open_parquet(path, *args, **kwargs)                  
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/arrow/dataset.py",                  
                             line 345, in open_parquet                                       
                                 return DatasetParquet(path,                                 
                             fs_options=fs_options, fs=fs,                                   
                             partitioning=partitioning, kwargs=kwargs)                       
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/arrow/dataset.py",                  
                             line 197, in __init__                                           
                                 super().__init__(max_rows_read=max_rows_read                
                             )                                                               
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/arrow/dataset.py",                  
                             line 26, in __init__                                            
                                 self._create_columns()                                      
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/arrow/dataset.py",                  
                             line 227, in _create_columns                                    
                                 super()._create_columns()                                   
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/arrow/dataset.py",                  
                             line 29, in _create_columns                                     
                                 self._create_dataset()                                      
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/arrow/dataset.py",                  
                             line 232, in _create_dataset                                    
                                 self._arrow_ds =                                            
                             pyarrow.dataset.dataset(source,                                 
                             filesystem=file_system,                                         
                             partitioning=self.partitioning)                                 
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/pyarrow/dataset.py", line                
                             667, in dataset                                                 
                                 return _filesystem_dataset(source, **kwargs)                
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/pyarrow/dataset.py", line                
                             420, in _filesystem_dataset                                     
                                 factory = FileSystemDatasetFactory(fs,                      
                             paths_or_selector, format, options)                             
                               File "pyarrow/_dataset.pyx", line 1854, in pya                
                             rrow._dataset.FileSystemDatasetFactory.__init__                 
                               File "pyarrow/error.pxi", line 143, in                        
                             pyarrow.lib.pyarrow_internal_check_status                       
                               File "pyarrow/_fs.pyx", line 1137, in                         
                             pyarrow._fs._cb_get_file_info_selector                          
                               File "/home/ubuntu/.pyenv/versions/3.7.5/lib/p                
                             ython3.7/site-packages/vaex/file/cache.py", line                
                             97, in get_file_info_selector                                   
                                 return self.fs.get_file_info_selector(*args,                
                             **kwargs)                                                       
                             AttributeError: 'pyarrow._s3fs.S3FileSystem'                    
                             object has no attribute 'get_file_info_selector'

Software information

  • Vaex version: {‘vaex’: ‘4.8.0’, ‘vaex-core’: ‘4.8.0’, ‘vaex-viz’: ‘0.5.1’, ‘vaex-hdf5’: ‘0.12.0’, ‘vaex-server’: ‘0.8.1’, ‘vaex-astro’: ‘0.9.0’, ‘vaex-jupyter’: ‘0.7.0’, ‘vaex-ml’: ‘0.17.0’}
  • Vaex was installed via: pip
  • OS: Ubuntu

Additional information I’m running on an EC2 instance so all the credentials for opening in s3 are already implemented

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Comments: 22 (10 by maintainers)

Most upvoted comments

I have the same error. Created an envrionment with just vaex installed and have the same error.