SALib: saltelli.sample returns several times the exact same samples
I recently upgrade to SAlib 1.4.0.2 and witnessed a behaviour that looks incorrect to me. When using saltelli.sample, most of the returned samples are identical, which would mean that the model is evaluated several times with the exact same input variables. Is this really how it should be?
Code from the SAlib example:
from SALib.sample import saltelli
from SALib.analyze import sobol
problem = {
'num_vars': 3,
'names': ['x1', 'x2', 'x3'],
'bounds': [[-3.14159265359, 3.14159265359],
[-3.14159265359, 3.14159265359],
[-3.14159265359, 3.14159265359]]
}
x = saltelli.sample(problem, 2)
Output:
x = array([[-3.14159265, -3.14159265, -3.14159265],
[-3.14159265, -3.14159265, -3.14159265],
[-3.14159265, -3.14159265, -3.14159265],
[-3.14159265, -3.14159265, -3.14159265],
[-3.14159265, -3.14159265, -3.14159265],
[-3.14159265, -3.14159265, -3.14159265],
[-3.14159265, -3.14159265, -3.14159265],
[-3.14159265, -3.14159265, -3.14159265],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ],
[ 0. , 0. , 0. ]])
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 18 (16 by maintainers)
You mean you’re unsure of what “skipping values” actually means?
The brief explanation I can offer is that:
To avoid the duplicate samples, you can skip a given number of points in the Sobol’ sequence (using the
skip_valuesargument). Avoiding the first point should avoid the duplicate samples.The caveat is simply that ideally
skip_values) will be a power of 2skip_valuescan be set to(2^n)-1(e.g., you pickn) and this value(2^n)-1would be <= the desired number of samplesAs I understand it, not following the above will still produce usable results (for some value of “usable”) but may take more samples than necessary (mentioned above).
As for removing samples, this is not recommended. As noted above, the Sobol’ sequence is deterministic so changing the samples destroys its structure, and I think the subsequent analysis likely won’t be usable. I recommend skipping values to avoid the duplicates rather than filtering.
I think I understand you, but I also understand the confusion.
Table 2 in [1] are not samples, they are points in the Sobol’ sequence.
The first row shows all points in this sequence for a 10-dimensional problem, and actually all dimensions, are identical (e.g., all set to 0.5).
As Campolongo et al., describes (in [1]): “As in the first points of the Sobol’ sequence the values of the coordinates tend to repeat (i.e. for the first point they are all equal to 0.5, for the second they are alternates couples of 0.25 and 0.75 and so on”
This repetition is what causes the initial samples to be identical:
“… in order to achieve different coordinates’ values for the points a and b, we need to generate a quasi-random matrix of Sobol’ numbers of size (R, 2k), with R > r, and discard the first few points for the auxiliary points …”
Hmm, I think to avoid overloading the docs and confusing users I will simplify to outlining just one of the recommendations: that both skip_values and N be a power of 2, and that skip_values be >= N.
https://github.com/scipy/scipy/pull/10844#issuecomment-673029539
Sure, I am happy to give feedback also in the future. Thanks for the quick responses!
Maybe for a bit of context: I integrated SAlib into our CLIMADA package to perform uncertainty and sensitivity analysis. CLIMADA can be used to model the impact and risk of natural catastrophes today and in the future.