scikit-learn: Ensemble models (and maybe others?) don't check for negative sample_weight
When sample weights are negative, the probabilities can come out negative as well:
>>> rng = np.random.RandomState(10)
>>> X = rng.randn(10, 4)
>>> y = rng.randint(0, 2, 10)
>>> sample_weight = rng.randn(10)
>>> clf = RandomForestClassifier().fit(X, y, sample_weight)
>>> clf.predict_proba(X)
array([[ 0.56133774, 0.43866226],
[ 1.03235924, -0.03235924],
[ 1.03235924, -0.03235924],
[ 1.03235924, -0.03235924],
[ 1.03235924, -0.03235924],
[ 1.03235924, -0.03235924],
[ 0.98071868, 0.01928132],
[ 0.56133774, 0.43866226],
[ 1.03235924, -0.03235924],
[ 1.03235924, -0.03235924]])
About this issue
- Original URL
- State: open
- Created 10 years ago
- Comments: 40 (35 by maintainers)
@arjoly my attempt at describing a very common scenario is above. The short answer is yes.
I just don’t see the point in discussing this endlessly. We keep repeating the same points. scikit-learn should just not care. As long as I can specify weights as I wish and not have some exception needlessly thrown if a weight is negative, then I am happy (and I know many others in my field would also be happy). This isn’t supporting a niche, but just being indifferent.