cudf: [FEA] dask-cudf doesn't support "corr"/correlation function like Pandas and cuDF
When attempting to perform a correlation like sales_corr = sales['pr_review_rating', 'count'].corr(sales['pr_review_rating', 'mean']) dask-cudf fails with the following error.
TypeError: cannot concatenate object of type "<class 'cudf.core.series.Series'>"; only pd.Series, pd.DataFrame, and pd.Panel (deprecated) objs are valid
It seems that we might limit this currently. However dask-cudf should behave exactly like cuDF and Pandas. https://github.com/rapidsai/cudf/blob/4613ba821e4ed03a2db744f2c0bb0959fd450191/python/dask_cudf/dask_cudf/backends.py#L30-L33
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 28 (16 by maintainers)
The
shape=argument was introduced by me in NumPy and CuPy to address exactly that shortcoming. It’s the only special case for array creation with__array_function__. The*_likefunctions will allow dispatching via__array_function__according to the first argument (if NumPy, dispatch to NumPy itself, if CuPy dispatch to CuPy, etc.), and the newshape=argument allows us to create an arbitrarily-shaped array with the correct array type, which wasn’t possible before.Confirmed this will work as expected, so no need for a Dask dispatch, sorry for false alarm 😅
It’s important to note that this failure was the product of an issue with cuDF and upstream libraries. That said, IIRC @rjzamora included a fix to cuDF and to Dask both of which include tests. @pentschev also implemented
nansumin CuPy, which has its own test. So I think this is covered pretty well. That said, if there is another test you would like to add, I think that would be happily accepted 🙂This should be resolved as of now with CuPy >= 7.