scipy: BUG: hypergeom.cdf slower in 1.8.0 than 1.7.3
Describe your issue.
While using the fisher_exact
test in a loop on some 2x2 tables, I noticed a significant slowdown between Scipy versions 1.7.3 and 1.8.0 (1.8.0 is about 20x slower than 1.7.3). I narrowed it down to the call to hypergeom.cdf
, and found a specific set of arguments with which the slowdown can be reproduced (see code example below).
Reproducing Code Example
import time
import scipy
from scipy.stats import distributions
ts = time.time()
for _ in range(10000):
distributions.hypergeom.cdf(0, 48127, 57, 35775)
te = time.time()
print(scipy.__version__, '%.2fs' % (te-ts))
# Output for version info 1.7.3 1.22.2 sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
# 1.7.3 1.84s
# Output for version info 1.8.0 1.22.2 sys.version_info(major=3, minor=8, micro=10, releaselevel='final', serial=0)
# 1.8.0 41.11s
Error message
-
SciPy/NumPy/Python version information
1.8.0 1.22.2 sys.version_info(major=3, minor=8, micro=10, releaselevel=‘final’, serial=0)
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 25 (20 by maintainers)
This has been fixed by https://github.com/boostorg/math/pull/930. In C++14 there was a change in memory storage duration that caused a >10,000 element table to be rebuilt numerous times during the evaluation of the cdf.
I have no issue with that. I’ve been meaning to email you and the scipy team on behalf of the Boost.Math team.
@cthoyt The issue I reported above was primiarily on Ubuntu and we determined it to be Boost-related. I confirmed that updating the scipy version to 1.9.3 alone does not change it.