scipy: Dirichlet doesn't accept its own random variates as input to pdf


In [1]: import scipy.stats as ss

In [2]: d = ss.dirichlet([1,3,4])

In [3]: d.pdf(d.rvs())
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-a6f3762e83cf> in <module>()
----> 1 d.pdf(d.rvs())

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scipy/stats/_multivariate.py in pdf(self, x)
   1365
   1366     def pdf(self, x):
-> 1367         return self._dist.pdf(x, self.alpha)
   1368
   1369     def mean(self):

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scipy/stats/_multivariate.py in pdf(self, x, alpha)
   1260         """
   1261         alpha = _dirichlet_check_parameters(alpha)
-> 1262         x = _dirichlet_check_input(alpha, x)
   1263
   1264         out = np.exp(self._logpdf(x, alpha))

/opt/local/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/scipy/stats/_multivariate.py in _dirichlet_check_input(alpha, x)
   1079                          "of entries as, or one entry fewer than, "
   1080                          "parameter vector 'a', but alpha.shape = %s "
-> 1081                          "and x.shape = %s." % (alpha.shape, x.shape))
   1082
   1083     if x.shape[0] != alpha.shape[0]:

ValueError: Vector 'x' must have either the same number of entries as, or one entry fewer than, parameter vector 'a', but alpha.shape = (3,) and x.shape = (1, 3).

In [4]: n = ss.norm()

In [5]: n.pdf(n.rvs())
Out[5]: 0.16964393208168185

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Comments: 28 (28 by maintainers)

Most upvoted comments

Update: it’s actually been six years since gh-6005, and Friday would be the anniversary. It would be fun to fix it that day : )

Wouldn’t it make sense to repair this bug before introducing another distribution that would have to be deprecated?

I think gh-7689 is clearly a bug and deprecation should not be necessary in theory. Although, it also has been the behavior of many methods of several multivariate distributions for a long time so there might be genuine concerns of backward-compatibility.

@mdhaber @tupui @Kai-Striega @chrisb83 Should we consider this a bug and fix it without maintaining back-compat or should we first deprecate and then change the behavior?

Almost. The testcase with size=1 fails but it’s whole other issue: https://github.com/scipy/scipy/issues/7689 (pdf uses np.squeeze so all the dimensions with 1 get reduced)

gh-7689 only mentions the squeeze use in rvs. I didn’t know about the pdf/logpdf squeeze : / We really should fix both. If we can’t, I’m afraid we should delay the release of multivariate_beta. We’re going through all this trouble to make the transpose of x change backward-compatible; I don’t think we should do a backward-incompatible change to fix the squeeze issues after that.

I agreed above that we should correct rvs at the same time, although that will complicate this further : /

Adding vectorization is something that has been requested for many multivariate distributions, and it is an enhancement that can be added in a backward-compatible way at any time, so it’s orthogonal to adding multivariate_beta . (Just to be clear - what do you mean by vectorization? Currently, pdf accepts 2D x. Do you mean accept ND x?)

I had in mind to propose just fixing it to the mailing list. The concern about resolving it in a backward compatible way is likely why the bug has persisted despite being reported five years ago. Probably better to just go for it and ask users to add np.squeeze to the output if they need it to work how it used to.

But it would probably be better if the size bug were fixed in this new distribution regardless, yes. Sounds like we have a little extra time before branching to work with.