scikit-learn: LocalOutlierFactor Doesn't Flag Obvious Outliers

Describe the bug

LocalOutlierFactor fails to flag outliers in an array filled with 1 and -1 where -1s are clearly outliers.

What happens is

  • It gives every sample a 1 label (inlier)
  • The corresponding LOF scores are as expected, i.e. 1s for the inliers and some very large values for the outliers

Note that everything works fine for many other sample sizes and contaminations.

Steps/Code to Reproduce

import numpy as np
from sklearn.neighbors import LocalOutlierFactor
from collections import Counter

# total samples to generate
SAMPLES = 50
# we want this many outliers
OUTLIERS = 13

# data
X = np.hstack((np.ones(SAMPLES - OUTLIERS), -np.ones(OUTLIERS))).reshape(-1, 1)

lof = LocalOutlierFactor(contamination=0.20, n_neighbors=13)
flags = lof.fit_predict(X)
lof_scores = -lof.negative_outlier_factor_

print(f"outliers in data: {OUTLIERS}; contamination: {OUTLIERS/SAMPLES:.4f}")
print(f"outlier flags issued by LocalOutlierFactor: {Counter(flags).get(-1,0)}")
print(
    f'LOF scores assigned by LocalOutlierFactor: {", ".join([f"{f} - {c} times" for f, c in Counter(lof_scores).items()])}'
)

Expected Results

  • LocalOutlierFactor assigns exactly 19 outlier flags, i.e. -1s

Actual Results

outliers in data: 13; contamination: 0.2600 outlier flags issued by LocalOutlierFactor: 0 LOF scores assigned by LocalOutlierFactor: 1.0 - 37 times, 1538461539.4615386 - 13 times

Versions

System:
    python: 3.9.6 (v3.9.6:db3ff76da1, Jun 28 2021, 11:49:53)  [Clang 6.0 (clang-600.0.57)]
executable: /Library/Frameworks/Python.framework/Versions/3.9/bin/python3
   machine: macOS-10.16-x86_64-i386-64bit

Python dependencies:
          pip: 21.2.4
   setuptools: 57.0.0
      sklearn: 0.24.2
        numpy: 1.21.0
        scipy: 1.6.3
       Cython: 0.29.21
       pandas: 1.3.2
   matplotlib: 3.4.2
       joblib: 1.0.0
threadpoolctl: 2.1.0

Built with OpenMP: True

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Reactions: 1
  • Comments: 16 (7 by maintainers)

Most upvoted comments

Yes, that’s a good one 😃 Looks like the fix would be not as simple as replacing < with <=.