scikit-learn: LocalOutlierFactor Doesn't Flag Obvious Outliers
Describe the bug
LocalOutlierFactor
fails to flag outliers in an array filled with 1 and -1 where -1s are clearly outliers.
What happens is
- It gives every sample a 1 label (inlier)
- The corresponding LOF scores are as expected, i.e. 1s for the inliers and some very large values for the outliers
Note that everything works fine for many other sample sizes and contaminations.
Steps/Code to Reproduce
import numpy as np
from sklearn.neighbors import LocalOutlierFactor
from collections import Counter
# total samples to generate
SAMPLES = 50
# we want this many outliers
OUTLIERS = 13
# data
X = np.hstack((np.ones(SAMPLES - OUTLIERS), -np.ones(OUTLIERS))).reshape(-1, 1)
lof = LocalOutlierFactor(contamination=0.20, n_neighbors=13)
flags = lof.fit_predict(X)
lof_scores = -lof.negative_outlier_factor_
print(f"outliers in data: {OUTLIERS}; contamination: {OUTLIERS/SAMPLES:.4f}")
print(f"outlier flags issued by LocalOutlierFactor: {Counter(flags).get(-1,0)}")
print(
f'LOF scores assigned by LocalOutlierFactor: {", ".join([f"{f} - {c} times" for f, c in Counter(lof_scores).items()])}'
)
Expected Results
LocalOutlierFactor
assigns exactly 19 outlier flags, i.e. -1s
Actual Results
outliers in data: 13; contamination: 0.2600 outlier flags issued by LocalOutlierFactor: 0 LOF scores assigned by LocalOutlierFactor: 1.0 - 37 times, 1538461539.4615386 - 13 times
Versions
System:
python: 3.9.6 (v3.9.6:db3ff76da1, Jun 28 2021, 11:49:53) [Clang 6.0 (clang-600.0.57)]
executable: /Library/Frameworks/Python.framework/Versions/3.9/bin/python3
machine: macOS-10.16-x86_64-i386-64bit
Python dependencies:
pip: 21.2.4
setuptools: 57.0.0
sklearn: 0.24.2
numpy: 1.21.0
scipy: 1.6.3
Cython: 0.29.21
pandas: 1.3.2
matplotlib: 3.4.2
joblib: 1.0.0
threadpoolctl: 2.1.0
Built with OpenMP: True
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 1
- Comments: 16 (7 by maintainers)
Yes, that’s a good one 😃 Looks like the fix would be not as simple as replacing
<
with<=
.