cudf: [BUG] dask-cudf `.describe()` broken with NumPy 1.20

The .describe()method of dask-cudf fails with cudf 0.18 (nightly) and NumPy v1.20. Minimal repro:

In [2]: import dask_cudf

In [3]: import cudf

In [4]: ddf = dask_cudf.from_cudf(cudf.DataFrame({'a': [1, 2, 3]}), npartitions=2)

In [5]: ddf.describe().compute()

<truncated>

~/rapids-compose/etc/conda/cuda_10.1/envs/rapids/lib/python3.7/site-packages/dask/array/percentile.py in merge_percentiles(finalq, qs, vals, interpolation, Ns)
    233     combined_vals, combined_counts = zip(*combined_vals_counts)
    234
--> 235     combined_vals = np.array([combined_vals])
    236     combined_counts = np.array(combined_counts)
    237

cupy/core/core.pyx in cupy.core.core.ndarray.__array__()

TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 15 (11 by maintainers)

Most upvoted comments

It looks like the code was already using the like= kwarg before the linked PR. Was that introduced in another PR that broke the numpy < 1.20?

It’s true, but in the previous code both combined_vals[0] and combined_counts[0] were already CuPy arrays after results of merge_sorted/zip, and https://github.com/dask/dask/pull/7172 attempted now to convert counts (a list) into a a CuPy array (with like=combined_vals).

Requiring NumPy 1.20+ makes a lot of sense to me. There’s a lot of really important improvements particularly for RAPIDS there (of course you already know this Peter) and I think we are going to find it hard to get things working for older NumPy versions