scikit-learn: RFECV failing at check_X_y() if estimator=Pipeline()

For my classifier, I have a pipeline consisting of a custom transformer, followed by an Imputer() and RandomForestClassifier(). I am able to call Pipeline.fit() and Pipeline.predict() methods successfully - but when I run RFECV, I get an error:

../python2.7/site-packages/sklearn/feature_selection/rfe.pyc in fit(self, X, y)
    349             regression).
    350         """
--> 351         X, y = check_X_y(X, y, "csr")
    352         if self.estimator_params is not None:
    353             warnings.warn("The parameter 'estimator_params' is deprecated as of version 0.16 "

../python2.7/site-packages/sklearn/utils/validation.pyc in check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, multi_output, ensure_min_samples, ensure_min_features, y_numeric)
    442     X = check_array(X, accept_sparse, dtype, order, copy, force_all_finite,
    443                     ensure_2d, allow_nd, ensure_min_samples,
--> 444                     ensure_min_features)
    445     if multi_output:
    446         y = check_array(y, 'csr', force_all_finite=True, ensure_2d=False,

../python2.7/site-packages/sklearn/utils/validation.pyc in check_array(array, accept_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features)
    342             else:
    343                 dtype = None
--> 344         array = np.array(array, dtype=dtype, order=order, copy=copy)
    345         # make sure we actually converted to numeric:
    346         if dtype_numeric and array.dtype.kind == "O":

ValueError: could not convert string to float: W

Since I can manually fit/predict on my pipeline, the problem is with RFECV and/or check_X_y, right?

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

RFE assumes numeric data. I guess that is not technically necessary… But what you try will fail because of the one-hot-encoding. How should RFECV know which features to mask?