scikit-learn: Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`.
Description
Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'
.
Steps/Code to Reproduce
from sklearn.linear_model import LogisticRegression
import sklearn.metrics
import numpy as np
# Set up a logistic regression object
lr = LogisticRegression(C=1000000, multi_class='multinomial',
solver='sag', tol=0.0001, warm_start=False,
verbose=0)
# Set independent variable values
Z = np.array([
[ 0. , 0. ],
[ 1.33448632, 0. ],
[ 1.48790105, -0.33289528],
[-0.47953866, -0.61499779],
[ 1.55548163, 1.14414766],
[-0.31476657, -1.29024053],
[-1.40220786, -0.26316645],
[ 2.227822 , -0.75403668],
[-0.78170885, -1.66963585],
[ 2.24057471, -0.74555021],
[-1.74809665, 2.25340192],
[-1.74958841, 2.2566389 ],
[ 2.25984734, -1.75106702],
[ 0.50598996, -0.77338402],
[ 1.21968303, 0.57530831],
[ 1.65370219, -0.36647173],
[ 0.66569897, 1.77740068],
[-0.37088553, -0.92379819],
[-1.17757946, -0.25393047],
[-1.624227 , 0.71525192]])
# Set dependant variable values
Y = np.array([1, 0, 0, 1, 0, 0, 0, 0,
0, 0, 1, 1, 1, 0, 0, 1,
0, 0, 1, 1], dtype=np.int32)
lr.fit(Z, Y)
p = lr.predict_proba(Z)
print(sklearn.metrics.log_loss(Y, p)) # ...
print(lr.intercept_)
print(lr.coef_)
Expected Results
If we compare against R or using multi_class='ovr'
, the log loss (which is approximately proportional to the objective function as the regularisation is set to be negligible through the choice of C
) is incorrect. We expect the log loss to be roughly 0.5922995
Actual Results
The actual log loss when using multi_class='multinomial'
is 0.61505641264
.
Further Information
See the stack exchange question https://stats.stackexchange.com/questions/306886/confusing-behaviour-of-scikit-learn-logistic-regression-multinomial-optimisation?noredirect=1#comment583412_306886 for more information.
The issue it seems is caused in https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/linear_model/logistic.py#L762. In the multinomial
case even if classes.size==2
we cannot reduce to a 1D case by throwing away one of the vectors of coefficients (as we can in normal binary logistic regression). This is essentially a difference between softmax (redundancy allowed) and logistic regression.
This can be fixed by commenting out the lines 762 and 763. I am apprehensive however that this may cause some other unknown issues which is why I am positing as a bug.
Versions
Linux-4.10.0-33-generic-x86_64-with-Ubuntu-16.04-xenial Python 3.5.2 (default, Nov 17 2016, 17:05:23) NumPy 1.13.1 SciPy 0.19.1 Scikit-Learn 0.19.0
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Comments: 18 (18 by maintainers)
Commits related to this issue
- Incorrect multinomial logistic regression predict_proba test added (#9889) — committed to rwolst/scikit-learn by rwolst 7 years ago
- Fixed incorrect multinomial logistic regression predict_proba (#9889) — committed to rwolst/scikit-learn by rwolst 7 years ago
- Updated what's new for multinomial logistic regression predictions (#9889) — committed to rwolst/scikit-learn by rwolst 7 years ago
- Updated doc string for coef_ and intercept_ (#9889) — committed to rwolst/scikit-learn by rwolst 7 years ago
So to sum-up, we need to:
predict_proba
as described in https://github.com/scikit-learn/scikit-learn/issues/9889#issuecomment-335129554coef_
's docstringwhats_new
Do you want to do it @rwolst ?
I think it would be excessive noise to warn such upon fit. why not just amend the coef_ description? most users will not be manually making probabilistic interpretations of coef_ in any case, and we can’t in general stop users misinterpreting things on the basis of assumption rather than reading the docs…
We already break the generalisation elsewhere, as in
decision_function
and in thecoef_
shape for other multiclass methods. I think maintaining internal consistency here might be more important than some abstract concern that “generalisation will be broken”. I think we should choose modifyingpredict_proba
. This also makes it clear that the multinomial case does not suddenly introduce more free parameters.