scipy: `scipy.stats.qmc.LatinHypercube` cannot sample single sample at a time

I am playing around scipy.stats.qmc for hyperparameter optimization and found that scipy.stats.qmc.LatinHypercube cannot sample one sample. Though sampling one sample is an impractical case, I expect that it should not just throw an error in this case.

@tupui I suppose you know a lot about this…

Reproducing code example:

import scipy.stats.qmc

engine = scipy.stats.qmc.LatinHypercube(d=2)
engine.random()  # same as `engine.random(n=1)`

Error message:

Traceback (most recent call last):
  File "b.py", line 4, in <module>
    engine.random(n=1)
  File "/home/kstoneriv3/.local/lib/python3.8/site-packages/scipy-1.7.0.dev0+fc77ea1-py3.8-linux-x86_64.egg/scipy/stats/_qmc.py", line 956, in random
    q = self.rg_integers(low=1, high=n, size=(n, self.d))
  File "_generator.pyx", line 460, in numpy.random._generator.Generator.integers
  File "_bounded_integers.pyx", line 1254, in numpy.random._bounded_integers._rand_int64
ValueError: low >= high

Scipy/Numpy/Python version information:

1.7.0.dev0+fc77ea1 1.19.5 sys.version_info(major=3, minor=8, micro=5, releaselevel=‘final’, serial=0)

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 39 (39 by maintainers)

Most upvoted comments

I would go for option 1 as it’s also hard for me to fully understand the algorithm for constructing OAs at the moment. Could you take a look at #13654 then? Its implementation is efficient but the readability is not quite high, I guess.

In addition, I would like to allow n=0 or even d=0(Edit: already OK to use d=0) in LHS, as discussed above.

Hi everybody, I got some emails from tupui about this.

-Art

For LHS you want each variable (each column of the resulting matrix) to have one value in the range [j/n,(j+1)/n) for all 0 <= j < n. You can permute integers 0 to n-1 into a random order and add a U(0,1) to each of them. Do that d times independently to get d columns and then divide the whole thing by n. [Many texts permute 1 to n and then subtract U(0,1). Same distribution.]

Some users will want `centered’ LHS that you can get by adding 0.5 instead of U(0,1).

Orthogonal LHS could mean different things to different people. The ones by Tang are good. It makes a big difference what OAs you use. It makes the most sense to use OAs of strength 2. Then you balance single and double marginal distributions. I like the ones that have n=s^2 where s is a prime number (or any prime power if you have an implementation of Galois fields) and d <= s+1 that I call the Bose construction in my papers. The OA book by Hedayat, Sloane and Stufken calls them something else. There are also arrays with n = 2s^2 for d <= 2s+1, once again with s a prime or prime power. (Warning: the modular arithmetic formulas are quite incorrect if s is not a prime.)

Using an OA of strength t=2, the Tang LHS balances first and second order margins nicely. A scrambled QMC rule will often have better balance because the OA only balances square subregions while the QMC balances many rectangular shapes in two dimensional margins and usually also balances some higher dimensional regions.

I have a paper where I generate LHS with very small correlations among columns that could also be called orthogonal LHS. Owen AB. Controlling correlations in Latin hypercube samples. Journal of the American Statistical Association. 1994 Dec 1;89(428):1517-22. Those along with optimized LHS are good for exploring a function. I’m not aware of studies on how they work for estimating integrals; they might have a subtle bias.

Randomizing an OA and embedding it in the unit cube [0,1]^d is also useful as inputs for visualizing a function. If you do the version adding 1/2 instead of U(0,1) then you get regular grids of points in low dimensional views. Owen AB. Orthogonal arrays for computer experiments, integration and visualization. Statistica Sinica. 1992 Jul 1:439-52.

Something that I think would be very cool for computational purposes is an implementation of randomized Hadamard matrices. Those are n x n binary arrays. The first column is all 1s, so discard that. Now you have n-1 mutually orthogonal binary columns each is half 1s and half -1s. The value of n can be almost any multiple of 4 that you like. The first Paley construction in the Hedayat et al book is available for lots of values of n, and it can be computed for n in the millions or billions because you only need to store O(n) binary variables at a time while delivering n rows of n-1 values. I.e., it can be flow through. You can even access any row you like without computing prior ones. A common strategy for fitting sparse models on n-1 variables is to take a random sample of m rows of the Hadamard matrix. So … Hadamard OAs are among the most important ones.

@tupui @kstoneriv3 I believe this was closed by gh-13654. Please re-open if I missed something.

No, I do not think so. From your quote, I can see that you need an orthogonal array (OA), which is not implemented in the current OrthogonalLatinHypercube, to support orthogonal Latin hypercube sampling (of strength >1).

OK, I will work on it!

I am really confused now 😅. It was quite some time ago, and I really thought I had the orthogonal version correct. I have to check. In any case, thanks for raising this!

Thank you for the detailed answer! As far as I can see, the iterative sampling in the Sobol sequence works perfectly even if I do not increase the sample size by the power of 2… When I looked at the code of a python package sobol_seq, iterative sampling of the Sobol sequence seemed safe. Are you 100% sure if the Sobol sequence can not always be sampled iteratively? If you can tell me some references on Sobol sequences, I can have check this point on my own…

The Sobol’ sequence is a 2**n sequence. Which means its properties are only guaranteed if you respect some rules.

So you cannot do burning without care for instance. If you need 256 points. You must skip at least 256 points before. But you will not be able to resampling without losing the properties. For this you would need to skip 512 points to at most get 512 for instance.

In this implementation, you can see that we have some warnings at first if you try to do something you should not. But we did not want to clutter to much the API and found the current balance.

Most implementation (if not all), do not warn about this and have other flaws like the missing first point. We had lengthy discussions on the original PR (with some of best specialists) and had some emails about all these. We showed that the convergence rate is greatly affected if you just remove one point from the sequence…

So yes, you will be able to use the methods like you want, in any order. We tried to put enough warnings around to explain that it will be wrong to do so.

You can check out Art’s paper which came out of these discussions: https://arxiv.org/abs/2008.08051

Hi! Thanks for reporting this. I will look into it. I agree that it should not fail. Although not very useful, any single point is a valid LHS. Also note that currently there is so strategy to add samples and maintain the properties.

If you have an idea before me, feel free to do a PR 😄