scikit-learn: fowlkes_mallows_score returns nan in binary classification

Description

fowlkes_mallows_score doesn’t work properly for large binary classification vectors. It returns values that are not between 0 and 1 or returns nan. In general, the equation shown in the documentation doesn’t yield the same results as the function.

Steps/Code to Reproduce

Edited by @jnothman: this reference implementation is incorrect. See comment below.

import sklearn
import numpy as np
def get_FMI(true,predicted):
    c = sklearn.metrics.confusion_matrix(true,predicted)
    TP = c[1][1]
    FP = c[0][1]
    FN = c[1][0]
    FMI = TP / np.sqrt((TP + FP) * (TP + FN))

    print('Should be', FMI)
    print('Is', sklearn.metrics.fowlkes_mallows_score(true, predicted))
    
# large vector
get_FMI(np.random.choice([0,1], 1362),np.random.choice([0,1], 1362))
# small vector
get_FMI(np.random.choice([0,1], 100),np.random.choice([0,1], 100))

Expected Results

Should be 0.487888392921 Is 0.487888392921

Should be 0.548853049023 Is 0.548853049023

Actual Results

Should be 0.487888392921 Is 15.3260054113

Should be 0.548853049023 Is 0.501109879279

Versions

Windows-10-10.0.10586-SP0 Python 3.5.2 |Anaconda custom (64-bit)| (default, Jul 5 2016, 11:41:13) [MSC v.1900 64 bit (AMD64)] NumPy 1.11.2 SciPy 0.18.1 Scikit-Learn 0.18.1

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 23 (19 by maintainers)

Commits related to this issue

Most upvoted comments

tk, pk and qk follow the same equation as the reference given in the documentation. The above code gave me an error on tk / np.sqrt(pk * qk) if tk != 0. else 0. which I could fix with tk / np.sqrt(pk) / np.sqrt(qk) if tk != 0. else 0.