scikit-learn: Incorrect predictions when fitting a LogisticRegression model on binary outcomes with `multi_class='multinomial'`.

Description

Incorrect predictions when fitting a LogisticRegression model on binary outcomes with multi_class='multinomial'.

Steps/Code to Reproduce

    from sklearn.linear_model import LogisticRegression
    import sklearn.metrics
    import numpy as np

    # Set up a logistic regression object
    lr = LogisticRegression(C=1000000, multi_class='multinomial',
                            solver='sag', tol=0.0001, warm_start=False,
                            verbose=0)

    # Set independent variable values
    Z = np.array([
       [ 0.        ,  0.        ],
       [ 1.33448632,  0.        ],
       [ 1.48790105, -0.33289528],
       [-0.47953866, -0.61499779],
       [ 1.55548163,  1.14414766],
       [-0.31476657, -1.29024053],
       [-1.40220786, -0.26316645],
       [ 2.227822  , -0.75403668],
       [-0.78170885, -1.66963585],
       [ 2.24057471, -0.74555021],
       [-1.74809665,  2.25340192],
       [-1.74958841,  2.2566389 ],
       [ 2.25984734, -1.75106702],
       [ 0.50598996, -0.77338402],
       [ 1.21968303,  0.57530831],
       [ 1.65370219, -0.36647173],
       [ 0.66569897,  1.77740068],
       [-0.37088553, -0.92379819],
       [-1.17757946, -0.25393047],
       [-1.624227  ,  0.71525192]])
    
    # Set dependant variable values
    Y = np.array([1, 0, 0, 1, 0, 0, 0, 0, 
                  0, 0, 1, 1, 1, 0, 0, 1, 
                  0, 0, 1, 1], dtype=np.int32)

    lr.fit(Z, Y)
    p = lr.predict_proba(Z)
    print(sklearn.metrics.log_loss(Y, p)) # ...

    print(lr.intercept_)
    print(lr.coef_)

Expected Results

If we compare against R or using multi_class='ovr', the log loss (which is approximately proportional to the objective function as the regularisation is set to be negligible through the choice of C) is incorrect. We expect the log loss to be roughly 0.5922995

Actual Results

The actual log loss when using multi_class='multinomial' is 0.61505641264.

Further Information

See the stack exchange question https://stats.stackexchange.com/questions/306886/confusing-behaviour-of-scikit-learn-logistic-regression-multinomial-optimisation?noredirect=1#comment583412_306886 for more information.

The issue it seems is caused in https://github.com/scikit-learn/scikit-learn/blob/ef5cb84a/sklearn/linear_model/logistic.py#L762. In the multinomial case even if classes.size==2 we cannot reduce to a 1D case by throwing away one of the vectors of coefficients (as we can in normal binary logistic regression). This is essentially a difference between softmax (redundancy allowed) and logistic regression.

This can be fixed by commenting out the lines 762 and 763. I am apprehensive however that this may cause some other unknown issues which is why I am positing as a bug.

Versions

Linux-4.10.0-33-generic-x86_64-with-Ubuntu-16.04-xenial Python 3.5.2 (default, Nov 17 2016, 17:05:23) NumPy 1.13.1 SciPy 0.19.1 Scikit-Learn 0.19.0

About this issue

Original URL
State: closed
Created 7 years ago
Comments: 18 (18 by maintainers)

Commits related to this issue

Incorrect multinomial logistic regression predict_proba test added (#9889) — committed to rwolst/scikit-learn by rwolst 7 years ago
Fixed incorrect multinomial logistic regression predict_proba (#9889) — committed to rwolst/scikit-learn by rwolst 7 years ago
Updated what's new for multinomial logistic regression predictions (#9889) — committed to rwolst/scikit-learn by rwolst 7 years ago
Updated doc string for coef_ and intercept_ (#9889) — committed to rwolst/scikit-learn by rwolst 7 years ago

Most upvoted comments

So to sum-up, we need to:

update predict_proba as described in https://github.com/scikit-learn/scikit-learn/issues/9889#issuecomment-335129554
update coef_'s docstring
add a test and a bugfix entry in whats_new

Do you want to do it @rwolst ?

TomDLT on Oct 11, 2017

I think it would be excessive noise to warn such upon fit. why not just amend the coef_ description? most users will not be manually making probabilistic interpretations of coef_ in any case, and we can’t in general stop users misinterpreting things on the basis of assumption rather than reading the docs…

jnothman on Oct 11, 2017

We already break the generalisation elsewhere, as in decision_function and in the coef_ shape for other multiclass methods. I think maintaining internal consistency here might be more important than some abstract concern that “generalisation will be broken”. I think we should choose modifying predict_proba. This also makes it clear that the multinomial case does not suddenly introduce more free parameters.

jnothman on Oct 9, 2017