pandas: Pandas doesn't recognize Pyarrow as a Parquet engine even though it's installed
Code Sample, a copy-pastable example if possible
In [6]: pd.io.parquet.get_engine('auto')
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-6-77cb1d6c8933> in <module>
----> 1 pd.io.parquet.get_engine('auto')
~/miniconda3/lib/python3.6/site-packages/pandas/io/parquet.py in get_engine(engine)
30 pass
31
---> 32 raise ImportError("Unable to find a usable engine; "
33 "tried using: 'pyarrow', 'fastparquet'.\n"
34 "pyarrow or fastparquet is required for parquet "
ImportError: Unable to find a usable engine; tried using: 'pyarrow', 'fastparquet'.
pyarrow or fastparquet is required for parquet support
Problem description
Pandas doesn’t recognize Pyarrow as a Parquet engine even though it’s installed. Note that you can see that Pyarrow 0.12.0 is installed in the output of pd.show_versions()
below.
Expected Output
In [2]: pd.io.parquet.get_engine('auto')
Out[2]: <pandas.io.parquet.PyArrowImpl at 0x119c78f28>
Output of pd.show_versions()
INSTALLED VERSIONS
commit: None python: 3.6.6.final.0 python-bits: 64 OS: Linux OS-release: 4.15.0-29-generic machine: x86_64 processor: x86_64 byteorder: little LC_ALL: en_US.UTF-8 LANG: en_US.UTF-8 LOCALE: en_US.UTF-8
pandas: 0.24.0 pytest: 3.9.3 pip: 18.1 setuptools: 40.5.0 Cython: None numpy: 1.15.4 scipy: 1.1.0 pyarrow: 0.12.0 xarray: None IPython: 7.1.1 sphinx: 1.8.2 patsy: 0.5.1 dateutil: 2.7.5 pytz: 2018.7 blosc: None bottleneck: None tables: None numexpr: None feather: None matplotlib: 3.0.1 openpyxl: None xlrd: None xlwt: None xlsxwriter: None lxml.etree: 4.2.5 bs4: None html5lib: None sqlalchemy: None pymysql: None psycopg2: None jinja2: 2.10 s3fs: None fastparquet: None pandas_gbq: None pandas_datareader: None gcsfs: None
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Reactions: 5
- Comments: 17 (3 by maintainers)
TLDR: I got it working by uninstalling via conda and installing with pip. So it appears that there’s something off about that specific conda version. Sorry for the noise.
Details below for others.
I didn’t have multiple versions of pyarrow installed.
I uninstalled via conda, verified I didn’t have pyarrow from pip, reinstalled via conda, and got the same error:
And then
I got it working by uninstalling via conda and installing with pip:
So it appears that there’s something off about that specific conda version.
see https://pandas.pydata.org/pandas-docs/stable/whatsnew/v1.0.0.html#increased-minimum-versions-for-dependencies
you need pyarrow >= 0.13
Can you debug this any further?
Can you try running
and posting the traceback?
Same issue here:
ImportError: Missing optional dependency 'pyarrow'. pyarrow is required for parquet support. Use pip or conda to install pyarrow.
pd.read_parquet(file_name, engine="pyarrow")
Python 3.11.6
Linux Mint 21.2 Cinnamon
Do you have multiple versions of pyarrow installed (perhaps one from pip)?
From your traceback, it seems like the issue is specifically
pyarrow.parquet
. I’m not sure thatsite-packages/pyarrow/../../../libparquet.so.12
is the expected path for libparquet… I’d recommendconda uninstall
ing pyarrow, parquet-cpp, and pip uninstall pyarrow a few times.I’m going to close this, since it seems to be an issue with your environment, but please keep posting here in case others run into the same issue.