scikit-learn: .predict_proba() for SVC produces incorrect results for binary classification

Description

svm.predict_proba() produces revered results for binary classification

this seems to be specific to binary classification. For example, it works fine for 3 way classification, which is in the test: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/tests/test_svm.py#L307
I think this is related to #394, but in this case we are using the same svm object so I think predict and predict_proba should agree
Acknowledgement: @amennen noticed this error 👍

Steps/Code to Reproduce

here’s the code on colab: https://colab.research.google.com/github/qihongl/random/blob/master/sklearn-svm-predict-proba-bug.ipynb

# train a svm 
X = np.array([[-1, -1], [1, 1]])
y = np.array([0, 1])
svm = SVC(probability=True)
svm.fit(X, y) 

# SVM makes reasonable prediction on the learned examples... 
print(svm.predict(X))

# but it makes the reversed probability estimates... 
print(svm.predict_proba(X))

Expected Results

[0 1]
[[0.66383953 0.33616047]
 [0.33916469 0.66083531]]

# i.e. when the prediction is class 0, prob(class0) should be bigger than prob(class1)

Actual Results

[0 1]
[[0.33616047 0.66383953]
 [0.66083531 0.33916469]]

# i.e. when the prediction is class 0, prob(class0) < prob(class1)

Versions

0.20.3

About this issue

Original URL
State: closed
Created 5 years ago
Comments: 22 (12 by maintainers)

Most upvoted comments

I came to the same conclusion in my analysis, and could also reproduce the same behavior with CalibratedClassifierCV when not using stratification. This is the evil of LOO in small noisy datasets.

So I agree with the closing this as a separate bug.

amueller on Apr 1, 2020