scikit-learn: Ensemble models (and maybe others?) don't check for negative sample_weight

When sample weights are negative, the probabilities can come out negative as well:

>>> rng = np.random.RandomState(10)
>>> X = rng.randn(10, 4)
>>> y = rng.randint(0, 2, 10)
>>> sample_weight = rng.randn(10)
>>> clf = RandomForestClassifier().fit(X, y, sample_weight)
>>> clf.predict_proba(X)
array([[ 0.56133774,  0.43866226],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924],
       [ 0.98071868,  0.01928132],
       [ 0.56133774,  0.43866226],
       [ 1.03235924, -0.03235924],
       [ 1.03235924, -0.03235924]])

About this issue

Original URL
State: open
Created 10 years ago
Comments: 40 (35 by maintainers)

Commits related to this issue

Add files via upload Tree MAE is not considering sample_weights when calculating impurity! In the proposed fix, you will see I have multiplied by the sample weight *after* applying the absolute to... — committed to JohnStott/scikit-learn by JohnStott 6 years ago

Most upvoted comments

@arjoly my attempt at describing a very common scenario is above. The short answer is yes.

I just don’t see the point in discussing this endlessly. We keep repeating the same points. scikit-learn should just not care. As long as I can specify weights as I wish and not have some exception needlessly thrown if a weight is negative, then I am happy (and I know many others in my field would also be happy). This isn’t supporting a niche, but just being indifferent.

ndawe on Apr 30, 2015