scipy: BUG: hypergeom.cdf slower in 1.8.0 than 1.7.3

Describe your issue.

While using the fisher_exact test in a loop on some 2x2 tables, I noticed a significant slowdown between Scipy versions 1.7.3 and 1.8.0 (1.8.0 is about 20x slower than 1.7.3). I narrowed it down to the call to hypergeom.cdf, and found a specific set of arguments with which the slowdown can be reproduced (see code example below).

Reproducing Code Example

import time
import scipy
from scipy.stats import distributions

ts = time.time()
for _ in range(10000):
    distributions.hypergeom.cdf(0, 48127, 57, 35775)
te = time.time()
print(scipy.__version__, '%.2fs' % (te-ts))

# Output for version info 1.7.3 1.22.2 sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
# 1.7.3 1.84s

# Output for version info 1.8.0 1.22.2 sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
# 1.8.0 41.11s

Error message

-

SciPy/NumPy/Python version information

1.8.0 1.22.2 sys.version_info(major=3, minor=8, micro=10, releaselevel=‘final’, serial=0)

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 25 (20 by maintainers)

Most upvoted comments

This has been fixed by https://github.com/boostorg/math/pull/930. In C++14 there was a change in memory storage duration that caused a >10,000 element table to be rebuilt numerous times during the evaluation of the cdf.

Would you mind if we started to ping you on suspected Boost issues? So far, I’ve hesitated to bring an issue to your attention until we can confirm that the bug is in Boost itself, not our wrapper.

I have no issue with that. I’ve been meaning to email you and the scipy team on behalf of the Boost.Math team.

@cthoyt The issue I reported above was primiarily on Ubuntu and we determined it to be Boost-related. I confirmed that updating the scipy version to 1.9.3 alone does not change it.