scikit-learn: .predict_proba() for SVC produces incorrect results for binary classification

Description

svm.predict_proba() produces revered results for binary classification

Steps/Code to Reproduce

here’s the code on colab: https://colab.research.google.com/github/qihongl/random/blob/master/sklearn-svm-predict-proba-bug.ipynb

# train a svm 
X = np.array([[-1, -1], [1, 1]])
y = np.array([0, 1])
svm = SVC(probability=True)
svm.fit(X, y) 

# SVM makes reasonable prediction on the learned examples... 
print(svm.predict(X))

# but it makes the reversed probability estimates... 
print(svm.predict_proba(X))

Expected Results

[0 1]
[[0.66383953 0.33616047]
 [0.33916469 0.66083531]]

# i.e. when the prediction is class 0, prob(class0) should be bigger than prob(class1)

Actual Results

[0 1]
[[0.33616047 0.66383953]
 [0.66083531 0.33916469]]

# i.e. when the prediction is class 0, prob(class0) < prob(class1)

Versions

0.20.3

About this issue

  • Original URL
  • State: closed
  • Created 5 years ago
  • Comments: 22 (12 by maintainers)

Most upvoted comments

I came to the same conclusion in my analysis, and could also reproduce the same behavior with CalibratedClassifierCV when not using stratification. This is the evil of LOO in small noisy datasets.

So I agree with the closing this as a separate bug.