scikit-learn: .predict_proba() for SVC produces incorrect results for binary classification
Description
svm.predict_proba()
produces revered results for binary classification
- this seems to be specific to binary classification. For example, it works fine for 3 way classification, which is in the test: https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/svm/tests/test_svm.py#L307
- I think this is related to #394, but in this case we are using the same
svm
object so I thinkpredict
andpredict_proba
should agree - Acknowledgement: @amennen noticed this error 👍
Steps/Code to Reproduce
here’s the code on colab: https://colab.research.google.com/github/qihongl/random/blob/master/sklearn-svm-predict-proba-bug.ipynb
# train a svm
X = np.array([[-1, -1], [1, 1]])
y = np.array([0, 1])
svm = SVC(probability=True)
svm.fit(X, y)
# SVM makes reasonable prediction on the learned examples...
print(svm.predict(X))
# but it makes the reversed probability estimates...
print(svm.predict_proba(X))
Expected Results
[0 1]
[[0.66383953 0.33616047]
[0.33916469 0.66083531]]
# i.e. when the prediction is class 0, prob(class0) should be bigger than prob(class1)
Actual Results
[0 1]
[[0.33616047 0.66383953]
[0.66083531 0.33916469]]
# i.e. when the prediction is class 0, prob(class0) < prob(class1)
Versions
0.20.3
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 22 (12 by maintainers)
I came to the same conclusion in my analysis, and could also reproduce the same behavior with CalibratedClassifierCV when not using stratification. This is the evil of LOO in small noisy datasets.
So I agree with the closing this as a separate bug.