datasets: Cannot import datasets - ValueError: pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility

Describe the bug

When trying to import datasets, I get a pyarrow ValueError:

Traceback (most recent call last): File “/Users/edward/test/test.py”, line 1, in <module> import datasets File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/datasets/init.py”, line 43, in <module> from .arrow_dataset import Dataset File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/datasets/arrow_dataset.py”, line 65, in <module> from .arrow_reader import ArrowReader File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/datasets/arrow_reader.py”, line 28, in <module> import pyarrow.parquet as pq File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/parquet/init.py”, line 20, in <module> from .core import * File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/parquet/core.py”, line 45, in <module> from pyarrow.fs import (LocalFileSystem, FileSystem, FileType, File “/Users/edward/opt/anaconda3/envs/cs235/lib/python3.9/site-packages/pyarrow/fs.py”, line 49, in <module> from pyarrow._gcsfs import GcsFileSystem # noqa File “pyarrow/_gcsfs.pyx”, line 1, in init pyarrow._gcsfs ValueError: pyarrow.lib.IpcWriteOptions size changed, may indicate binary incompatibility. Expected 88 from C header, got 72 from PyObject

Steps to reproduce the bug

import datasets

Expected behavior

Successful import

Environment info

Conda environment, MacOS python 3.9.12 datasets 2.12.0

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 6
  • Comments: 24 (5 by maintainers)

Commits related to this issue

Most upvoted comments

I got the same error, pyarrow 12.0.0 released May/2023 (https://pypi.org/project/pyarrow/) is not compatible, running pip install pyarrow==11.0.0 to force install the previous version solved the problem.

Do we need to update dependencies?

Hi, if this helps anyone, pip install pyarrow==11.0.0 did not work for me (I’m using Colab) but this worked: !pip install --extra-index-url=https://pypi.nvidia.com cudf-cu11

I got different. Solved with pip install pyarrow==12.0.1 pip install cchardet

env: Python 3.9.16 transformers 4.32.1

The above methods didn’t help me. So I installed an older version: !pip install datasets==2.16.1 and import datasets worked!!

(I was doing quiet install so I didn’t notice it initially) I’ve been loading the same dataset for months on Colab, just now I got this error as well. I think Colab has changed their image recently (I had some errors regarding CUDA previously as well). beware of this and restart runtime if you’re doing quite pip installs. moreover installing stable version of datasets on pypi gives this:

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
ibis-framework 7.1.0 requires pyarrow<15,>=2, but you have pyarrow 15.0.0 which is incompatible.
Successfully installed datasets-2.17.0 dill-0.3.8 multiprocess-0.70.16 pyarrow-15.0.0
WARNING: The following packages were previously imported in this runtime:
  [pyarrow]
You must restart the runtime in order to use newly installed versions.

Hi, if this helps anyone, pip install pyarrow==11.0.0 did not work for me (I’m using Colab) but this worked: !pip install --extra-index-url=https://pypi.nvidia.com cudf-cu11

thanks! I met the same problem and your suggestion solved it.

Thanks for replying. I am not sure about those environments but it seems like pyarrow-12.0.0 does not work for conda with python 3.8.16.

Please note that our CI properly passes all tests with pyarrow-12.0.0, for Python 3.7 and Python 3.10, for Ubuntu and Windows: see for example https://github.com/huggingface/datasets/actions/runs/5157324334/jobs/9289582291