scipy: BUG: scipy's pdist(X, metric='dice') VS scipy.spatial.distance.dice() produce different results

Describe your issue.

Please forgive me if this is just a lack of my understanding of the difference between the two function but I can’t seem to find why these would be different. the code should make it clear, but basically I am confused why there is a different result (for the last set [1, 0, 0], [2, 0, 0] for these two metrics that I assume should be the same.

thanks!

Reproducing Code Example

#pdist

from scipy.spatial.distance import pdist

X = [[1, 0, 0], [0, 1, 0]]
tmp1 = pdist(X, metric='dice')
print(tmp1)
# [1.]
X = [[1, 0, 0], [1, 1, 0]]
tmp1 = pdist(X, metric='dice')
print(tmp1)
# [0.33333333]
X = [[1, 0, 0], [2, 0, 0]]
tmp1 = pdist(X, metric='dice')
print(tmp1)
# [0.]

#dice

from scipy.spatial import distance
tmp1 = distance.dice([1, 0, 0], [0, 1, 0])
print(tmp1)
# 1.0
tmp1 = distance.dice([1, 0, 0], [1, 1, 0])
print(tmp1)
# 0.3333333333333333
tmp1 = distance.dice([1, 0, 0], [2, 0, 0])
print(tmp1)
# -0.3333333333333333

Error message

None

SciPy/NumPy/Python version information

1.8.0 1.21.6 sys.version_info(major=3, minor=8, micro=4, releaselevel=‘final’, serial=0)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Hi @tsabbir96 I think @peterbell10 wanted to fix this along with #17538. Can you confirm Peter?

I think handling this in the wrapper is good. Type promotion is okay, and AFAIK all the _distance_pybind functions promote to floating point. We really only want to fall back when the input types cannot safely be cast to the supported type, which is mainly boolean metrics.