xgboost: Change default eval_metric for binary:logistic objective + add warning for missing eval_metric when early stopping is enabled
I stumbled over the default metric of the binary:logistic objective. It seems to be 1-accuracy, which is a rather unfortunate choice. Accuracy is not even a proper scoring rule, see e.g. Wiki.
In my view, it should be “logloss”, which is a strictly proper scoring rule in estimating the expectation under the binary objective.
– Edit
The problem occurs with early stopping without manually setting the eval_metric. The default evaluation metric should at least be a strictly consistent scoring rule.
I am using R with XGBoost version 1.1.1.1.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 11
- Comments: 17 (7 by maintainers)
I perfectly agree that changing this default is potentially “breaking”. Still, it might be worth considering it.
Here a famous example: Titanic
There is some training, we stop after 25 rounds.
With the default, there is no training and the algo stops after the first round…
Do you think the binary logistic case is the only one where the default metric is inconsistent with the objective?
To new contributors: If you’re reading this and interested in contributing this feature, please comment here. Feel free to ping me with questions.
@mayer79 Yes, let’s change the default for multiclass classification as well.
@mayer79 @lorentzenchr Thanks to the recent discussion, I changed my mind. Let us change the default metric with a clear documentation as well as a run-time warning.
@jameslamb Nice. Yes, let’s throw a warning for a missing
eval_metricwhen early stopping is used. With the warning, the case I mentioned (reproducibility) is also covered, and we can change the default metric.I think you can use
missing()to check ifeval_metricwas not passed, and do something like this:In LightGBM, if you use
objective = "regression"and don’t provide ametric, L2 is used as objective and as the evaluation metric for early stopping.For example, if you do this with
{lightgbm}3.0.0 in R, you can test with something like this@mayer79 How common do you think it is to use early stopping without explicitly specifying the evaluation metric?
I’ve been thinking through this. I think it is ok to change the default to
loglossin the next minor release (1.3.x).I think it is ok for the same training code given the same data to produce a different model between minor releases (1.2.x to 1.3.x).
That won’t cause anyone’s code to raise an exception, won’t have any effect on loading previously-trained models from older versions, and any retraining code should be looking at the performance of a new model based on a validation set and a fixed metric anyway.
As long as the changelog in the release makes it clear that that default was changed and that it only affects the case where you are using early stopping, I don’t think it’ll cause problems.