scikit-learn: GridSearchCV parallel execution with own scorer freezes
I have been searching hours on this problem and can consistently replicate it:
clf = GridSearchCV( sk.LogisticRegression(),
tuned_parameters,
cv = N_folds_validation,
pre_dispatch='6*n_jobs',
n_jobs=4,
verbose = 1,
scoring=metrics.make_scorer(metrics.scorer.f1_score, average="macro")
)
This snippet crashes because of scoring=metrics.make_scorer(metrics.scorer.f1_score, average=“macro”) where metrics refers to sklearn.metrics module. If I cancel out the scoring=… line, the parallel execution works. If I want to use the f1 score as evaluation method, I have to cancel out the parallel execution by setting n_jobs = 1.
Is there a way I can define another score method without losing the parallel execution possibility?
Thanks
About this issue
- Original URL
- State: closed
- Created 10 years ago
- Reactions: 3
- Comments: 99 (50 by maintainers)
Hum, that is likely related to issues of multiprocessing on windows. Maybe @GaelVaroquaux or @ogrisel can help. I don’t know what the notebook makes of the
__name__ == "__main__"
. Try not defining the metric in the notebook, but in a separate file and import it. I’d think that would fix it. This is not really related to GridSearchCV, but some interesting interaction between windows multiprocessing, IPython notebook and joblib.Seems like you’re running out of ram. Maybe try using Keras instead, it’s likely a better solution for large scale neural nets.
I recently read the following, which I found interesting: https://lwn.net/Articles/730630/rss