cudf: [FEA] Quantile does not support datetime types (or floats for q)
Describe the bug
When trying to call quantile with datetime64[ns] data, I get the following exception:
RuntimeError: cuDF failure at: /opt/conda/envs/rapids/conda-bld/libcudf_1598487738985/work/cpp/src/quantiles/quantile.cu:45: quantile does not support non-numeric types
Steps to Reproduce
rng = pd.date_range('2015-02-24', periods=5, freq='D')
df = pd.DataFrame({ 'date': rng, 'Val' : np.random.randn(len(rng))})
cdf = cudf.DataFrame.from_pandas(df)
cdf.quantile(0.8, **{})
I can work around this by casting:
cdf['date'].astype('int64').quantile(0.8, **{}).astype('datetime64[ns]')
But this gives me an error on the first parameter:
AttributeError: ‘Scalar’ object has no attribute ‘astype’
Expected behavior
Quantile should accept a float (in addition to an array-like) for the input q parameter. It should also accept datetime data and return the appropriate quantile result to match pandas quantile: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.quantile.html
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 16 (8 by maintainers)
I can work on this
I wasn’t sure if I should add this clarification here or edit the first post, but I’d also like to make sure that quantile returns a float when q is a float. This is the expected behavior for pandas.series.quantile: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.quantile.html
Let me know if I should add that to the initial post. Thanks.
I do not think quantiles at the C++ level should support datetime types as it loses type information, i.e.,
quantiliesalways returnsdouble. We can’t represent floating point datetime values. Caller should opt into losing the type information by first casting to an integer type (ideally vialogical_castto avoid a deep copy).