scikit-learn: Scalar fit_params no longer handled. Was: Singleton array (insert value here) cannot be considered a valid collection.

Description

TypeError: Singleton array array(True) cannot be considered a valid collection.

Steps/Code to Reproduce

Found when running RandomizedSearchCV with LightGBM. Previously worked fine. Latest update requires that all the **fit_params be checked for ‘slicability’. Difficult when some fit params are things like early_stopping_rounds = 5.

#Import the modules
import lightgbm as lgb
from sklearn.model_selection import RandomizedSearchCV
from sklearn.model_selection import GridSearchCV

#Create parameters grid
#Create fixed parameters
mod_fixed_params = {
    'boosting_type':'gbdt'
    ,'random_state':0
    ,'silent':False
    ,'objective':'multiclass'
    ,'num_class':np.unique(y_train)
    ,'min_samples_split':200 #Should be between 0.5-1% of samples
    ,'min_samples_leaf':50
    ,'subsample':0.8
}
search_params = {
    'fixed':{
        'cv':3
        ,'n_iter':80
        ,'verbose':True
        ,'random_state':0
    }
    ,'variable':{
        'learning_rate':[0.1,0.01,0.005]
        ,'num_leaves':np.linspace(10,1010,100,dtype=int)
        ,'max_depth':np.linspace(2,22,10,dtype=int)
    }
}
fit_params = {
    'verbose':True
    ,'eval_set':[(X_valid,y_valid)]
    ,'eval_metric':lgbm_custom_loss
    ,'early_stopping_rounds':5
}

#Setup the model
lgb_mod = lgb.LGBMClassifier(**mod_fixed_params)
#Add the search grid
seed = np.random.seed(0)
gbm = RandomizedSearchCV(lgb_mod,search_params['variable'],**search_params['fixed'])
#Fit the model
gbm.fit(X_train,y_train,**fit_params)
print('Best parameters found by grid search are: {}'.format(gbm.best_params_))

I’ve traced the error through and it starts in model_selection/_search.py ln652

–>

Expected Results

Expected to run the LightGBM wthrough RandomSearchGrid

Actual Results

TypeError: Singleton array array(True) cannot be considered a valid collection.

Versions

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 17 (13 by maintainers)

Most upvoted comments

I’m really not sure if this is a regression on our side though.

It’s been raised 2-3 times in the couple of weeks since 0.22 was released, and not before.

In 0.21.X:

https://github.com/scikit-learn/scikit-learn/blob/ee328faa3601b40944ad43e28bce71860d39f2de/sklearn/model_selection/_search.py#L630-L632

in 0.22.X

https://github.com/scikit-learn/scikit-learn/blob/bf24c7e3d6d768dddbfad3c26bb3f23bc82c0a18/sklearn/model_selection/_search.py#L650-L654

early_stopping_rounds does very much sound like an init parameter in our conventions

It does. But we have tacitly supported this behaviour for many, many releases and have changed the behaviour without warning. The support is more than tacit in the sense that _fit_and_score explicitly makes use of a helper that bypasses fit params that are not samplewise:

https://github.com/scikit-learn/scikit-learn/blob/bf24c7e3d6d768dddbfad3c26bb3f23bc82c0a18/sklearn/model_selection/_validation.py#L940-L944

Thus the previous behaviour could be understood as supported and intended behaviour, even though it was untested (with respect to search at least).

Yes, we can change behaviour around things that do not conform to our conventions, but the change was introduced by @amueller in #14702 and was incidental to that PR. If we are going to change our handling of popular if non-conforming estimators, it should be done intentionally, and incidental changes should indeed be reverted in patch releases, IMO.

Let’s not deprecate non-aligned fit_params just yet. We need to carefully think about it first. Non-aligned fit_params is one proposition to implement the new warm start API https://github.com/scikit-learn/scikit-learn/pull/15105

We might also want to add feature-aligned params in the future, who knows

+1 for at least restoring backward compat in 0.22.1 (to restore support for scalar fit params possibly with deprecation warning).

+0 for keeping (undocumented) support for scalar fit params indefinitely without deprecation warning as they feel harmless to me.

No this was never mentioned.

@jnothman Could you please explain your comment “It’s a bit frustrating that third party libraries ignore our convention of only having sample-aligned arguments to fit”, thanks.

No this was never mentioned. We said “data-dependent”. feature aligned arrays is still an option although we would have to discuss the use case.

I saw “data-dependent” in our doc, but I think it’s difficult to define what’s “data-dependent”, and I guess “data-dependent” parameters don’t need to be indexable.