cudf: [BUG] mean() fails on groupby
nycsmall.csv:
VendorID,tpep_pickup_datetime,tpep_dropoff_datetime,passenger_count,trip_distance,RatecodeID,store_and_fwd_flag,PULocationID,DOLocationID,payment_type,fare_amount,extra,mta_tax,tip_amount,tolls_amount,improvement_surcharge,total_amount,cudf_groupby_level_index
1,2017-01-09 11:13:28,2017-01-09 11:25:45,1,3.3,1,2313200,263,161,1,12.5,0.0,0.5,2.0,0.0,0.30000000000000004,15.3,1
1,2017-01-09 11:32:27,2017-01-09 11:36:01,1,0.9,1,2313200,186,234,1,5.0,0.0,0.5,1.45,0.0,0.30000000000000004,7.25,1
import pandas as pd
import cudf
df = pd.read_csv('nycsmall.csv')
df.groupby(df.passenger_count).mean()
print(df.groupby(df.passenger_count).mean())
cdf = cudf.read_csv('nycsmall.csv')
cdf.groupby(cdf.passenger_count).min().to_pandas()
cdf.groupby(cdf.passenger_count).mean().to_pandas()
The call on groupby().mean() fails with the following:
---------------------------------------------------------------------------
GDFError Traceback (most recent call last)
<ipython-input-9-6ae1e588e916> in <module>
----> 1 cdf.groupby(cdf.passenger_count).mean().to_pandas()
~/GitRepos/cudf/python/cudf/groupby/groupby.py in mean(self, sort)
318
319 def mean(self, sort=True):
--> 320 return self._apply_basic_agg("mean", sort)
321
322 def agg(self, args):
~/GitRepos/cudf/python/cudf/groupby/groupby.py in _apply_basic_agg(self, agg_type, sort_results)
250 result = self._apply_agg(
251 agg_type, result, add_col_values, ctx, val_columns,
--> 252 val_columns_out, sort_result=sort_results)
253
254 # If a Groupby has one index column and one value column
~/GitRepos/cudf/python/cudf/groupby/groupby.py in _apply_agg(self, agg_type, result, add_col_values, ctx, val_columns, val_columns_out, sort_result)
194 out_col_values,
195 out_col_agg,
--> 196 ctx)
197
198 if (err is not None):
~/miniconda3/envs/cudf-dev/lib/python3.7/site-packages/libgdf_cffi/wrapper.py in wrap(*args)
25 if errcode != self._api.GDF_SUCCESS:
26 errname, msg = self._get_error_msg(errcode)
---> 27 raise GDFError(errname, msg)
28
29 wrap.__name__ = fn.__name__
GDFError: GDF_UNSUPPORTED_DTYPE
cc @jrhemstad any ideas what’s going on here ? I can try and reduce the columns to figure out what’s going on if that’s helpful
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 17 (17 by maintainers)
I think dropping unsupported dtypes makes sense. This would also apply to
median()as well