cudf: [BUG] mean() fails on groupby

nycsmall.csv:

VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,cudf_groupby_level_index
1,2017-01-09 11:13:28,2017-01-09 11:25:45,1,3.3,1,2313200,263,161,1,12.5,0.0,0.5,2.0,0.0,0.30000000000000004,15.3,1
1,2017-01-09 11:32:27,2017-01-09 11:36:01,1,0.9,1,2313200,186,234,1,5.0,0.0,0.5,1.45,0.0,0.30000000000000004,7.25,1
import pandas as pd
import cudf

df = pd.read_csv('nycsmall.csv')
df.groupby(df.passenger_count).mean()
print(df.groupby(df.passenger_count).mean())

cdf = cudf.read_csv('nycsmall.csv')
cdf.groupby(cdf.passenger_count).min().to_pandas()

cdf.groupby(cdf.passenger_count).mean().to_pandas()

The call on groupby().mean() fails with the following:

---------------------------------------------------------------------------
GDFError                                  Traceback (most recent call last)
<ipython-input-9-6ae1e588e916> in <module>
----> 1 cdf.groupby(cdf.passenger_count).mean().to_pandas()

~/GitRepos/cudf/python/cudf/groupby/groupby.py in mean(self, sort)
    318 
    319     def mean(self, sort=True):
--> 320         return self._apply_basic_agg("mean", sort)
    321 
    322     def agg(self, args):

~/GitRepos/cudf/python/cudf/groupby/groupby.py in _apply_basic_agg(self, agg_type, sort_results)
    250         result = self._apply_agg(
    251             agg_type, result, add_col_values, ctx, val_columns,
--> 252             val_columns_out, sort_result=sort_results)
    253 
    254         # If a Groupby has one index column and one value column

~/GitRepos/cudf/python/cudf/groupby/groupby.py in _apply_agg(self, agg_type, result, add_col_values, ctx, val_columns, val_columns_out, sort_result)
    194                 out_col_values,
    195                 out_col_agg,
--> 196                 ctx)
    197 
    198             if (err is not None):

~/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/libgdf_cffi/wrapper.py in wrap(*args)
     25                     if errcode != self._api.GDF_SUCCESS:
     26                         errname, msg = self._get_error_msg(errcode)
---> 27                         raise GDFError(errname, msg)
     28 
     29                 wrap.__name__ = fn.__name__

GDFError: GDF_UNSUPPORTED_DTYPE

cc @jrhemstad any ideas what’s going on here ? I can try and reduce the columns to figure out what’s going on if that’s helpful

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (17 by maintainers)

Most upvoted comments

I think dropping unsupported dtypes makes sense. This would also apply to median() as well