scikit-learn: GridSearchCV cannot be paralleled when custom scoring is used

Hi,

I met a problem with the code:

    from sklearn.model_selection import GridSearchCV
    model = ensemble.RandomForestRegressor()
    param = {'n_estimators': [500, 700, 1200],
             'max_depth': [3, 5, 7],
             'max_features': ['auto'],
             'n_jobs': [-1],
             'criterion': ['mae', 'mse'],
             'random_state': [300],
             }
    from sklearn.metrics import make_scorer
    def my_custom_loss_func(ground_truth, predictions):
        diff = np.abs(ground_truth - predictions) / ground_truth
        return np.mean(diff)
    loss = make_scorer(my_custom_loss_func, greater_is_better=False)
    model_cv = GridSearchCV(model, param, cv=5, n_jobs=2, scoring=loss, verbose=1)
    model_cv.fit(X, y.ravel())

in which I used custom scoring object in GridSearchCV(…) and set n_jobs = 2.

I got the following error message:

C:\Anaconda3\python.exe C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py
Fitting 5 folds for each of 18 candidates, totalling 90 fits
Traceback (most recent call last):
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 172, in <module>
    models, scas = learn_all(X_train, y_train)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 108, in learn_all
    models[machine], scas[machine] = learn_cv(X, y)
  File "C:/Users/to/PycharmProjects/Toppan/[10-24]per_machine_vapor_pred_ver2.py", line 87, in learn_cv
    model_cv.fit(X, y.ravel())
  File "C:\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py", line 638, in fit
    cv.split(X, y, groups)))
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 789, in __call__
    self.retrieve()
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\parallel.py", line 699, in retrieve
    self._output.extend(job.get(timeout=self.timeout))
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 608, in get
    raise self._value
  File "C:\Anaconda3\lib\multiprocessing\pool.py", line 385, in _handle_tasks
    put(task)
  File "C:\Anaconda3\lib\site-packages\sklearn\externals\joblib\pool.py", line 371, in send
    CustomizablePickler(buffer, self._reducers).dump(obj)
AttributeError: Can't pickle local object 'learn_cv.<locals>.my_custom_loss_func'

Process finished with exit code 1

It seems that if and only if n_jobs is set to 1 can the program be run.

Any ideas?

About this issue

  • Original URL
  • State: closed
  • Created 7 years ago
  • Comments: 25 (19 by maintainers)

Most upvoted comments

No need to open an issue first, @fx86

@fx86 whenever you can 😉 No rush

This is a limitation of pickling. Define my_custome_loss_func in a module you import, or at least not in a closure.