dask-ml: LogisticRegression cannot train from Dask DataFrame

A simple example:

from dask import dataframe as dd
from dask_glm.datasets import make_classification
from dask_ml.linear_model import LogisticRegression

X, y = make_classification(n_samples=10000, n_features=2)

X = dd.from_dask_array(X, columns=["a","b"])
y = dd.from_array(y)

lr = LogisticRegression()
lr.fit(X, y)

Returns KeyError: (<class 'dask.dataframe.core.DataFrame'>,)

I did not have time to try if it is also the case for other models.

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 4
  • Comments: 33 (15 by maintainers)

Most upvoted comments

Thanks. At the moment the dask_glm based estimators just work with dask arrays, not dataframes. You can use .values to get the array.

I’m hoping to put in some helpers for handling all the extra DataFrame metadata sometime soon, so this will be more consistent across estimators.