scikit-learn: Cross-validation returning multiple scores

Scorer objects currently provide an interface that returns a scalar score given an estimator and test data. This is necessary for *SearchCV to calculate a mean score across folds, and determine the best score among parameters.

This is very hampering in terms of the diagnostic information available from a cross-fold validation or parameter exploration, which one can see by comparing to the catalogue of metrics that includes: precision and recall with F-score; scores for each of multiple classes as well as an aggregate; and error distributions (i.e. PR-curve or confusion matrix). @solomonm (#1837) and I (ML, an implementation within #1768) have independently sought Precision and Recall to be returned from cross-validation routines when F1 is used as the cross-validation objective; @eickenberg on https://github.com/scikit-learn/scikit-learn/pull/1381#commitcomment-2607318 posed a concern regarding array of scores corresponding to multiple targets.

I thought it deserved an Issue of its own to solidify the argument and its solution.

Some design options:

Allow multiple scorers to be provided to cross_val_score or *SearchCV (henceforth CVEvaluator), with one specified as the objective. But since the Scorer generally calls estimator.{predict,decision_function,predict_proba}, each scorer would repeat this work.
Separate the objective and non-objective metrics as parameters to CVEvaluator: the scoring parameter remains as it is and a diagnostics parameter provides a callable with similar (same?) arguments as Scorer, but returning a dict. This means that the prediction work is repeated but not necessarily as many times as there are metrics. This diagnostics callable is more flexible and perhaps could be passed the training data as well as the test data.
Continue to use the scoring parameter, but allow the Scorer to return a dict with a special key for the objective score. This would need to be handled by the caller. For backwards compatibility, no existing scorers would change their behaviour of returning a float. This ensures no repeated prediction work.
Add an additional method to the Scorer interface that generates a set of named outputs (as with calc_names proposed in #1837), again with a special key for the objective score. This allows users to continue using scoring='f1' but get back precision and recall for free.

Note that 3. and 4. potentially allow for any set of metrics to be composed into a scorer without redundant prediction work (and 1. allows composition with highly redundant prediction work).

Comments, critiques and suggestions are very welcome.

About this issue

Original URL
State: closed
Created 11 years ago
Reactions: 10
Comments: 33 (30 by maintainers)

Commits related to this issue

ENH GridSearchCV and cross_val_score check whether the returned score is actually a number, not an array (otherwise cross_val_score returns bogus). — committed to amueller/scikit-learn by amueller 11 years ago

Most upvoted comments

Did this go anywhere? It would be really nice to pass a list of metrics to cross val score and get a list of scores in the same order or a dict with metric names as keys.

dengemann on Oct 16, 2015

@raghavrv this is still open, right? Is there a PR?

amueller on Aug 25, 2016

I think we’re close, and there’s a fair chance you’ll see this in 0.19. But we have no desire to rush into design that then need to be redesigned.

I’m interested in whether the current proposal (#7388), allowing multiple values for scoring is better than a generic callback to extract diagnostic info from each fit, or whether we need both…

On 3 Mar 2017 1:02 am, “RokoMijic” notifications@github.com wrote:

Any progress on this front? I am busy putting an explanation in a docstring for some code telling the reader why I am re-implementing crossvalidation rather than using scikit-learn.

jnothman opened this issue on Apr 11, 2013 ·

Are we going to send this issue to elementary school? It’s going to be 4 years old soon! 😉

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/scikit-learn/scikit-learn/issues/1850#issuecomment-283660832, or mute the thread https://github.com/notifications/unsubscribe-auth/AAEz61L3w0cwzfoaZEGdfUMnE-AbmDdNks5rhsvjgaJpZM4AkbI5 .

jnothman on Mar 3, 2017

Any progress on this front? I am busy putting an explanation in a docstring for some code telling the reader why I am re-implementing crossvalidation rather than using scikit-learn.

jnothman opened this issue on Apr 11, 2013 ·

Are we going to send this issue to elementary school? It’s going to be 4 years old soon! 😉 Anyway just do add, scikit-learn is awesome and I’m really grateful for the hard work that people put into it!

RokoMijic on Mar 2, 2017