scikit-learn: fetch_openml: Add an option to which returns a DataFrame
fetch_openml currently rejects STRING-valued attributes and ordinal-encodes all NOMINAL attributes, in order to return an array or sparse matrix of floats by default.
We should have a parameter that instead returns a DataFrame of features as the ‘data’ entry in the returned Bunch. This would (by default) keep nominals as pd.Categorical
and strings as objects. Columns would have names determined from the ARFF attribute names / OpenML metadata. Perhaps we would also set the DataFrame’s index corresponding to the is_row_identifier
attribute in OpenML.
See #10733 for the general issue of an API for returning DataFrames in sklearn.datasets
.
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 17 (11 by maintainers)
Btw, #12502 is somewhat related.