scikit-learn: GridSearchCV parallel execution with own scorer freezes

I have been searching hours on this problem and can consistently replicate it:

clf = GridSearchCV( sk.LogisticRegression(),
                            tuned_parameters,
                            cv = N_folds_validation,
                            pre_dispatch='6*n_jobs', 
                            n_jobs=4,
                            verbose = 1,
                            scoring=metrics.make_scorer(metrics.scorer.f1_score, average="macro")
                        )

This snippet crashes because of scoring=metrics.make_scorer(metrics.scorer.f1_score, average=“macro”) where metrics refers to sklearn.metrics module. If I cancel out the scoring=… line, the parallel execution works. If I want to use the f1 score as evaluation method, I have to cancel out the parallel execution by setting n_jobs = 1.

Is there a way I can define another score method without losing the parallel execution possibility?

Thanks

About this issue

Original URL
State: closed
Created 10 years ago
Reactions: 3
Comments: 99 (50 by maintainers)

Most upvoted comments

Hum, that is likely related to issues of multiprocessing on windows. Maybe @GaelVaroquaux or @ogrisel can help. I don’t know what the notebook makes of the __name__ == "__main__". Try not defining the metric in the notebook, but in a separate file and import it. I’d think that would fix it. This is not really related to GridSearchCV, but some interesting interaction between windows multiprocessing, IPython notebook and joblib.

amueller on Apr 2, 2015

Seems like you’re running out of ram. Maybe try using Keras instead, it’s likely a better solution for large scale neural nets.

amueller on May 26, 2018

Having a fix to multiprocessing be dependent on the scikit-learn version is symptomatic of the problems of vendoring…

I recently read the following, which I found interesting: https://lwn.net/Articles/730630/rss

GaelVaroquaux on Aug 12, 2017