cudf: [BUG] can't DMatrix cuDF in xgboost 0.90.rapidsdev1
I’m working with a CUDF called df2
type(df2)
cudf.core.dataframe.DataFrame
X, y = df2.drop('16', axis=1), df2['16']
type(X)
cudf.core.dataframe.DataFrame
type(y)
cudf.core.series.Series
param = {'objective': 'binary:logistic',
'tree_method': 'gpu_hist',
#'tree_method': 'hist',
'eval_metric': 'logloss',
}
train=xgboost.DMatrix(X, label=y)
I got the following errors: ValueError: cannot copy sequence with size 629470 to array axis with dimension 70 ValueError: unrecognized csr_matrix constructor usage TypeError: can not initialize DMatrix from DataFrame
However if I convert X and y to pandas, everything works:
type(df2)
cudf.core.dataframe.DataFrame
X, y = df2.drop('16', axis=1).to_pandas(), df2['16'].to_pandas()
type(X)
pandas.core.frame.DataFrame
type(y)
cudf.core.series.Series
param = {'objective': 'binary:logistic',
'tree_method': 'gpu_hist',
#'tree_method': 'hist',
'eval_metric': 'logloss',
}
%%time
train=xgboost.DMatrix(X, label=y)
model=xgboost.train(param,train)
_CPU times: user 1.5 s, sys: 834 ms, total: 2.34 s
Wall time: 2.14 s_
Am I missing something?
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 16 (9 by maintainers)
@ivenzor , Got the dataset. Thanks for sharing. The issue seems to be at creating a
dmatrixwithintandfloatcolumns withnones.You don’t see this error in
pandasas pandas upcastsint columnswithNonesto afloat dtype.Current Suggested workaround: You can upcast to
floatslike below to matchpandasbehavior .In the mean time, i will look into resolving this.
Minimal example for issue:
Error Trace: