scikit-learn: Ensemble models (and maybe others?) don't check for negative sample_weight

When sample weights are negative, the probabilities can come out negative as well:

>>> rng = np.random.RandomState(10)
>>> X = rng.randn(10, 4)
>>> y = rng.randint(0, 2, 10)
>>> sample_weight = rng.randn(10)
>>> clf = RandomForestClassifier().fit(X, y, sample_weight)
>>> clf.predict_proba(X)
array([[ 0.56133774,  0.43866226],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924],
       [ 0.98071868,  0.01928132],
       [ 0.56133774,  0.43866226],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924]])

About this issue

  • Original URL
  • State: open
  • Created 10 years ago
  • Comments: 40 (35 by maintainers)

Commits related to this issue

Most upvoted comments

@arjoly my attempt at describing a very common scenario is above. The short answer is yes.

I just don’t see the point in discussing this endlessly. We keep repeating the same points. scikit-learn should just not care. As long as I can specify weights as I wish and not have some exception needlessly thrown if a weight is negative, then I am happy (and I know many others in my field would also be happy). This isn’t supporting a niche, but just being indifferent.