cudf: Error with "Expected 48 from C header, got 40 from PyObject"

When I try the Rapid GPU online colab demo, I have encountered the following issue:

image

The error log:

/usr/local/lib/python3.7/site-packages/cudf/utils/gpu_utils.py:93: UserWarning: You will need a GPU with NVIDIA Pascal™ or newer architecture
Detected GPU 0: Tesla K80 
Detected Compute Capability: 3.7
  f"You will need a GPU with NVIDIA Pascal™ or "
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-a95ca25217db> in <module>()
----> 1 import cudf
      2 import io, requests
      3 
      4 # download CSV file from GitHub
      5 url="https://github.com/plotly/datasets/raw/master/tips.csv"

2 frames
/usr/local/lib/python3.7/site-packages/cudf/__init__.py in <module>()
     10 
     11 from cudf.api.types import dtype
---> 12 from cudf import api, core, datasets, testing
     13 from cudf._version import get_versions
     14 from cudf.api.extensions import (

/usr/local/lib/python3.7/site-packages/cudf/datasets.py in <module>()
      3 
      4 import cudf
----> 5 from cudf._lib.transform import bools_to_mask
      6 from cudf.core.column_accessor import ColumnAccessor
      7 

/usr/local/lib/python3.7/site-packages/cudf/_lib/__init__.py in <module>()
      2 import numpy as np
      3 
----> 4 from . import (
      5     avro,
      6     binaryop,

cudf/_lib/avro.pyx in init cudf._lib.avro()

cudf/_lib/column.pyx in init cudf._lib.column()

cudf/_lib/scalar.pyx in init cudf._lib.scalar()

cudf/_lib/interop.pyx in init cudf._lib.interop()

ValueError: pyarrow.lib.Codec size changed, may indicate binary incompatibility. Expected 48 from C header, got 40 from PyObject

Has anyone had the same issue? I even reinstall the different version of Numpy but still doesn’t work.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 23 (9 by maintainers)

Most upvoted comments

@Halegua @hchen98 I get the same pyarrow._cuda error posted above now.

However, what works for me is, right before importing cudf, running the code I posted before.

import sys

# clear Pandas and PyArrow from the module cache
# and force them to be reloaded on import.
# WARNING: I don't know what else this might break.
# Ideally, none of this should be in the module cache
# in the first place.

mods = [mod for mod in sys.modules if mod.startswith(("pandas", "pyarrow"))]
for mod in mods:
  del sys.modules[mod]

Note that you’ll need to make sure you start with a fresh runtime. Thus:

  1. click Runtime -> Restart Runtime
  2. Then execute the code I posted above
  3. Only after that, import cudf

Thanks!

The dist-packages in the path indicates that pyarrow was installed using the system package manager (eg., apt), and that version is being preferred over the one installed by conda.

We’re investigating what exactly is going wrong on Colab and will report back here with a solution.

Hi @craigcitro, I just rerun the latest online demo on Colab, pyarrow still in version 5.0.0 (see detail below).

image

Also, this time it gives a different error when importing the cudf. The error log is as follows:

---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
[<ipython-input-4-a95ca25217db>](https://localhost:8080/#) in <module>()
----> 1 import cudf
      2 import io, requests
      3 
      4 # download CSV file from GitHub
      5 url="https://github.com/plotly/datasets/raw/master/tips.csv"

2 frames
[/usr/local/lib/python3.7/site-packages/cudf/__init__.py](https://localhost:8080/#) in <module>()
     10 
     11 from cudf.api.types import dtype
---> 12 from cudf import api, core, datasets, testing
     13 from cudf._version import get_versions
     14 from cudf.api.extensions import (

[/usr/local/lib/python3.7/site-packages/cudf/datasets.py](https://localhost:8080/#) in <module>()
      3 
      4 import cudf
----> 5 from cudf._lib.transform import bools_to_mask
      6 from cudf.core.column_accessor import ColumnAccessor
      7 

[/usr/local/lib/python3.7/site-packages/cudf/_lib/__init__.py](https://localhost:8080/#) in <module>()
      2 import numpy as np
      3 
----> 4 from . import (
      5     avro,
      6     binaryop,

cudf/_lib/gpuarrow.pyx in init cudf._lib.gpuarrow()

ModuleNotFoundError: No module named 'pyarrow._cuda'

---------------------------------------------------------------------------
NOTE: If your import is failing due to a missing package, you can
manually install dependencies using either !pip or !apt.

To view examples of installing some common dependencies, click the
"Open Examples" button below.
---------------------------------------------------------------------------

To convince myself that this is the issue, I deleted pandas and pyarrow from the module cache before importing cudf and that seems to work:

import sys

# clear Pandas and PyArrow from the module cache
# and force them to be reloaded on import.
# WARNING: I don't know what else this might break.
# Ideally, none of this should be in the module cache
# in the first place.

mods = [mod for mod in sys.modules if mod.startswith(("pandas", "pyarrow"))]
for mod in mods:
  del sys.modules[mod]

@shwina here you go:

image

I’ve got the same error today on the PRO version of Google Colab. I’ve been using RAPIDS for a few months with no issues until now and always used the script from https://rapids.ai/start.html to install RAPIDS in Colab. @shwina I attached screenshots with the output you mentioned.

Screenshot 2022-02-02 at 15 11 01 Screenshot 2022-02-02 at 15 10 35

Typically this happens when you have a version of arrow that is incompatible with cuDF, e.g., installed separately via pip or something else. Could you please post the output of the following command on your terminal?

$ conda list | grep arrow

…and the following lines in the Python interpreter?

>>> import pyarrow
>>> print(pyarrow.__file__)

Hi @shwina, colab-team member here; we got a similar report that I’ve duped into this issue. LMK if there are questions I can answer from our side.

There was one possibly-relevant change on our side: we just updated pandas from 1.1.5 to 1.3.5; IIUC you’re installing your own copy of pandas from conda, but I thought I’d mention it in case it was helpful.