datasets: Cannot import datasets - ValueError: pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility
Describe the bug
When trying to import datasets, I get a pyarrow ValueError:
Traceback (most recent call last): File “/Users/edward/test/test.py”, line 1, in <module> import datasets File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/datasets/init.py”, line 43, in <module> from .arrow_dataset import Dataset File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/datasets/arrow_dataset.py”, line 65, in <module> from .arrow_reader import ArrowReader File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/datasets/arrow_reader.py”, line 28, in <module> import pyarrow.parquet as pq File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/parquet/init.py”, line 20, in <module> from .core import * File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/parquet/core.py”, line 45, in <module> from pyarrow.fs import (LocalFileSystem, FileSystem, FileType, File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/fs.py”, line 49, in <module> from pyarrow._gcsfs import GcsFileSystem # noqa File “pyarrow/_gcsfs.pyx”, line 1, in init pyarrow._gcsfs ValueError: pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility. Expected 88 from C header, got 72 from PyObject
Steps to reproduce the bug
import datasets
Expected behavior
Successful import
Environment info
Conda environment, MacOS python 3.9.12 datasets 2.12.0
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 6
- Comments: 24 (5 by maintainers)
I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running
pip install pyarrow==11.0.0to force install the previous version solved the problem.Do we need to update dependencies?
Hi, if this helps anyone, pip install pyarrow==11.0.0 did not work for me (I’m using Colab) but this worked: !pip install --extra-index-url=https://pypi.nvidia.com cudf-cu11
I got different. Solved with pip install pyarrow==12.0.1 pip install cchardet
env: Python 3.9.16 transformers 4.32.1
The above methods didn’t help me. So I installed an older version:
!pip install datasets==2.16.1andimport datasetsworked!!(I was doing quiet install so I didn’t notice it initially) I’ve been loading the same dataset for months on Colab, just now I got this error as well. I think Colab has changed their image recently (I had some errors regarding CUDA previously as well). beware of this and restart runtime if you’re doing quite pip installs. moreover installing stable version of datasets on pypi gives this:
thanks! I met the same problem and your suggestion solved it.
Thanks for replying. I am not sure about those environments but it seems like pyarrow-12.0.0 does not work for conda with python 3.8.16.