cudf: [FEA] Quantile does not support datetime types (or floats for q)

Describe the bug

When trying to call quantile with datetime64[ns] data, I get the following exception:

RuntimeError: cuDF failure at: /opt/conda/envs/rapids/conda-bld/libcudf_1598487738985/work/cpp/src/quantiles/quantile.cu:45: quantile does not support non-numeric types

Steps to Reproduce

    rng = pd.date_range('2015-02-24', periods=5, freq='D')
    df = pd.DataFrame({ 'date': rng, 'Val' : np.random.randn(len(rng))}) 
    cdf = cudf.DataFrame.from_pandas(df)
    cdf.quantile(0.8, **{})

I can work around this by casting:

cdf['date'].astype('int64').quantile(0.8, **{}).astype('datetime64[ns]')

But this gives me an error on the first parameter:

AttributeError: ‘Scalar’ object has no attribute ‘astype’

Expected behavior

Quantile should accept a float (in addition to an array-like) for the input q parameter. It should also accept datetime data and return the appropriate quantile result to match pandas quantile: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.quantile.html

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 16 (8 by maintainers)

Commits related to this issue

Most upvoted comments

I can work on this

I wasn’t sure if I should add this clarification here or edit the first post, but I’d also like to make sure that quantile returns a float when q is a float. This is the expected behavior for pandas.series.quantile: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.quantile.html

Let me know if I should add that to the initial post. Thanks.

I do not think quantiles at the C++ level should support datetime types as it loses type information, i.e., quantilies always returns double. We can’t represent floating point datetime values. Caller should opt into losing the type information by first casting to an integer type (ideally via logical_cast to avoid a deep copy).