stdlib: Hypergeometric cdf function doesn't match R after many decimal points?

Take the following javascript code:

// https://stdlib.io/docs/api/v0.0.90/@stdlib/stats/base/dists/hypergeometric/cdf
import cdf from '@stdlib/stats/base/dists/hypergeometric/cdf';

// the stdlib cdf function calculates P(X <= k), but what we really want is
// P(X >= k). therefore, provide k-1 as X instead of k to get P(X < k), then
// subtract the result from 1 to get P(X >= k)
const hyperGeometricTest = (k, K, n, N) => 1 - cdf(k - 1, N, K, n);
console.log(hyperGeometricTest(1, 71, 86, 2443))

And the following R code:

# computed/run with https://rdrr.io/snippets/
k <- 1; K <- 71; n <- 86; N <- 2443; sum <- 0; for(i in k:K) { sum <- sum + dhyper(i, K, N-K, n); }; print(sum, digits = 20);

which according to my best understanding of statistics, should result in the same number. However, here are the results, respectively:

0.9243998310893528
0.92439983108944501211

As you can see, they’re almost identical, up to like the 12th decimal place. It seems like some kind of numerical error in the cdf function. Although it’s possible I’m misusing the function, and the two code snippets shouldn’t be exactly the same value.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 15 (10 by maintainers)

Most upvoted comments

@vincerubinetti Thanks for reaching out.

For your use case in ADAGE, I can see why you’d want to have the exact same result as R. I am guessing that the original ML work for the Greene Lab was written in R?

However, as @Planeshifter mentions, no one “right” answer exists. While our implementations are by no means infallible, we do test against reference implementations. For this particular function, we test against Julia. You can find the tests for evaluated the CDF here and for the PMF here. Included in the test code is the delta, which provides a sense as to how well our implementations track a particular reference implementation.

In this particular case, the PMF evaluates the natural log of the factorial function, which, in turn, evaluates the natural log of the gamma function. Our reference implementation tracking error can be found here.

In short, not only can different approximation formulas differ (here, we use a recursive formula), but also the underlying special functions can differ. In stdlib, the underlying special function tracks well with the reference implementation (here, Julia). Given that the CDF formula, while one among many, is valid, we are not particularly surprised that we might differ from R for this particular function. In fact, we expect to differ more often than not, given the fact that errors (rounding and approximation) in underlying implementations propagate. We see that not only for stdlib and R, but also for SciPy and R, and Julia and R, etc.

My advice for your use case is to worry less about exactly equal numerical results and more that the overall conclusions derived from your application remain the same. If the conclusions of an analysis hinge on differences in the 12th decimal place, something is probably flawed in the analysis, as experiments are rarely so precise and experimental error is rarely so small. My hunch is that significant digits calculations based on experimental error would reveal that the numerical differences observed here are simply noise.

That said, should you have an alternative (non-GPL) algorithm which demonstrates “better” behavior, we’re more than willing to consider updating. But, for now, we don’t see the observed differences as cause for concern.

Thanks again for reaching out!

kgryte on Apr 14, 2020

…but…if you feel the need to display a double, then using floating-point epsilons should be fine, as you have done.

kgryte on May 15, 2020

Hi @vincerubinetti,

observing p-values of zero does not indicate that something is awry. It just means that the observed data is wholly inconsistent with the distribution stipulated under your null hypothesis. If you observe very small p-values, you can safely reject your null hypothesis (recall that in many published scientific papers p-values of < 0.05 or < 0.01 are considered sufficient for rejection). Ronald Fisher, the father of much of modern statistics, said the following with regards to the p-value in the context of a chi-squared independence test: “Provided the deviation is clearly significant, it is of no practical importance whether P is .01 or ·.000,001” (See: https://psychclassics.yorku.ca/Fisher/Methods/chap4.htm).

I would not be concerned with numerical deviations in the result of a statistical test. One should be far more worried about whether the specified probability modal is appropriate, issues of multiple testing, etc.

However, if you believe there to be an error in the CDF implementation, could you please provide an example set of values for which a result of zero is returned in stdlib but not in R or Python and we will have a look?

In general, I would say it’s a good idea to not display an exact value but < [some epsilon value], as it also de-emphasizes the actual value and does not project certainty when there is none to be had with hypothesis testing in the first place. The New England Journal of Medicine for example has the following style guideline:

In general, P values larger than 0.01 should be reported to two decimal places, and those between 0.01 and 0.001 to three decimal places; P values smaller than 0.001 should be reported as P<0.001. Notable exceptions to this policy include P values arising from tests associated with stopping rules in clinical trials or from genome-wide association studies. (https://www.nejm.org/author-center/new-manuscripts)

There is no standard way to tell how many decimal places an implementation will be accurate to. There is also is no basis to assume that R, Julia, or Python would provide you a ground truth, to which the stdlib implementation should be compared to in order to arrive at an answer to this question.

Planeshifter on May 15, 2020

I see, that makes sense to me. I should note that I’m not a statistician; I’m a front-end developer who took one stats class in college and temporarily has to wear the stat hat :x

I thought for these things there would be rigid formulas, meaning you could only sort of implement them so many ways. Or at least, any implementation should always tend towards the same exact solution, given enough floating point precision.

But it seems like this stuff might be more like numerical analysis than exact formulas. I suppose you can close the issue, and my team will have to decide whether they’re okay with the “inaccuracy”.

vincerubinetti on Apr 14, 2020