math: Possible incorrect argument used in test of inc_beta_ddb

I think an incorrect argument is being passed to the function calls inc_beta_ddb_test:

https://github.com/stan-dev/math/blob/de404ccc3fd18a0ed3f12299b8698539098451bb/test/unit/math/prim/fun/inc_beta_ddb_test.cpp#L18-L21

Shouldn’t the fourth argument be digamma(small_b) instead of digamma(small_a) (and similarly to the other tests)?

Obviously this does not mean anything for the function itself, but it makes it difficult to check independently what the expected value should have been.

Apologies if I misunderstood something 😃

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 1
  • Comments: 32 (19 by maintainers)

Most upvoted comments

I think this issue is drifting so I wanted to address the original question: it does look like the wrong argument is used in the test.

That’s a good question. To answer it we would need to compare inc_beta_ddb to grad_inc_beta over the same domain both in terms of accuracy and runtime. Is that something you are interested in trying? That seems to be where Ben is going with this as well. Either we get a clear answer about the need for a better inc_beta_ddb or we get to remove extra code. Ultimately it’s going a be a PR with a pretty significant testing burden…

Some history: so I looked at this function (https://github.com/stan-dev/math/blob/9895be8ee0a27a9c7935cf19345d76fcbc85a69d/stan/math/prim/fun/inc_beta_ddb.hpp) and at grad_inc_beta and I think the original code was Michael Betancourt, and that I modified some of the current code to improve accuracy in parts of the domain (that’s those if statements at the head of the function which use standard identities). grad_inc_beta currently defers to grad_2F1 and that avoids duplicating this kind of code… I wrote that version of the algorithm and we (me and @bbbales2 I think…) updated the convergence conditions there so it might make sense to look at that version of the calculation if we want to improve the convergence checks.

So tl;dr: @ricardoV94 if you’re interested in improving this algorithm in a specific direction I think that would make the lib better and I’m happy to help with the implementation.

Changing the tests sounds fine, altough @sakrejda seems to argue such “failing” tests may still be useful to keep around.

Since there’s almost always a range beyond which these functions fail it seems like the standard is to document the range in the function doc (it’s a soft failure, they get less accurate and there’s error in everything else, it often doesn’t matter much, it can often be avoided by rescaling). I would modify the test range so the function passes and leave an issue that states that beyond a certain point the function looses accuracy. If somebody needs that region we can add a PR to fix that region.

We found by accident that the discussed test point (even if given the right arguments) fails to match other implementations (i.e., finite-differences, the previous grad_reg_inc_beta and the automatic differentiation obtained in PyMC3, which all agree on the same value).

My suggestion is: 1) update the main post on the issue to document the reference values used (with code) and where you think the range should be extended; 2) make a branch that documents the failure; and 3) create a PR that fixes the branch to pass the new tests. So far I see mentions of various reference values and I’d be happy to help fix issues with the function but I’m not currently up to documenting the issue b/c you guys seem to have a handle on that already.

I don’t have a good intuition. I can do some benchmarks to see if there is an obvious difference in computational cost for “usual values”. One possible advantage of the old function is that it computes the derivatives of a and b simultaneously which might be faster than calling the separate methods one at a time.

Yes, there’s usually some efficiency to be gained by computing both at the same time, and sometimes you can get better stability but you’d have to compare the code between the different functions (I can help with that once it gets there if you want).

I will have a look at this, together with point 1 above. Perhaps a simple numerical analysis to measure the largest error / disagreement observed with this function in a given a, b, z range would be useful to add to the inline documentation of the functions.

The algorithm may well be that some took the power-series used for the more general function and simplified the calculation where possible. I don’t know what the current math lib standards are but from my point of view getting closer to known errors for specific domains would help the Stan math library.

@sakrejda Could you expand on this?

My main point was that all the other implementations fail beyond some domain too. They just do it more or less gracefully and some of them cover a larger domain before failing.

Obviously we also need to check for the accuracy of the computations, but those speed benchmarks look promising.

I’ll do a more careful test soon taking into consideration whether/where the outputs agree or diverge also with finite differences and autodiff (I don’t have Mathematica to compare, but maybe someone here has?) I can also check for that other implementation on that repo.

I just did a very rough comparison (in Python) and the inc_beta_dd* seems to be a couple order of magnitudes faster than the grad_reg_inc_beta (even when calling both _dda and _ddb separately):

a, b, z = 150, 1.25, 0.99
digamma_alpha = sp.digamma(a)
digamma_beta = sp.digamma(b)
digamma_sum = sp.digamma(a+b)

grad_reg_inc_beta(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# (-0.002718021453572821, 0.3368083597855926)

inc_beta_ddab(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# (-0.0027178389107593466, 0.3368034464501962)

%timeit grad_reg_inc_beta(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# 59.9 ms ± 806 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit inc_beta_ddab(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# 43.6 µs ± 5.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Of course this may not be the case for other values + the accuracy doubts raised above

It looks like inc_beta_dda is used in neg_binomial_cdf, neg_binomial_2_cdf and beta_cdf. So it seems like the function with a smaller reach. If grad_inc_beta covers that domain, then yeah I think we can just use grad_inc_beta.