cudf: [BUG] dask-cudf `.describe()` broken with NumPy 1.20
The .describe()method of dask-cudf fails with cudf 0.18 (nightly) and NumPy v1.20. Minimal repro:
In [2]: import dask_cudf
In [3]: import cudf
In [4]: ddf = dask_cudf.from_cudf(cudf.DataFrame({'a': [1, 2, 3]}), npartitions=2)
In [5]: ddf.describe().compute()
<truncated>
~/rapids-compose/etc/conda/cuda_10.1/envs/rapids/lib/python3.7/site-packages/dask/array/percentile.py in merge_percentiles(finalq, qs, vals, interpolation, Ns)
233 combined_vals, combined_counts = zip(*combined_vals_counts)
234
--> 235 combined_vals = np.array([combined_vals])
236 combined_counts = np.array(combined_counts)
237
cupy/core/core.pyx in cupy.core.core.ndarray.__array__()
TypeError: Implicit conversion to a NumPy array is not allowed. Please use `.get()` to construct a NumPy array explicitly.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (11 by maintainers)
It’s true, but in the previous code both
combined_vals[0]andcombined_counts[0]were already CuPy arrays after results ofmerge_sorted/zip, and https://github.com/dask/dask/pull/7172 attempted now to convertcounts(alist) into a a CuPy array (withlike=combined_vals).Requiring NumPy 1.20+ makes a lot of sense to me. There’s a lot of really important improvements particularly for RAPIDS there (of course you already know this Peter) and I think we are going to find it hard to get things working for older NumPy versions