scikit-learn: RFECV failing at check_X_y() if estimator=Pipeline()
For my classifier, I have a pipeline consisting of a custom transformer, followed by an Imputer() and RandomForestClassifier(). I am able to call Pipeline.fit() and Pipeline.predict() methods successfully - but when I run RFECV, I get an error:
../python2.7/site-packages/sklearn/feature_selection/rfe.pyc in fit(self, X, y)
349 regression).
350 """
--> 351 X, y = check_X_y(X, y, "csr")
352 if self.estimator_params is not None:
353 warnings.warn("The parameter 'estimator_params' is deprecated as of version 0.16 "
../python2.7/site-packages/sklearn/utils/validation.pyc in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric)
442 X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
443 ensure_2d, allow_nd, ensure_min_samples,
--> 444 ensure_min_features)
445 if multi_output:
446 y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,
../python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features)
342 else:
343 dtype = None
--> 344 array = np.array(array, dtype=dtype, order=order, copy=copy)
345 # make sure we actually converted to numeric:
346 if dtype_numeric and array.dtype.kind == "O":
ValueError: could not convert string to float: W
Since I can manually fit/predict on my pipeline, the problem is with RFECV and/or check_X_y, right?
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Comments: 15 (8 by maintainers)
RFE assumes numeric data. I guess that is not technically necessary… But what you try will fail because of the one-hot-encoding. How should RFECV know which features to mask?