Ax: [Sobol fallback needed] Repeated trials in experiment (and numerical errors they sometimes cause: `RuntimeError: cholesky_cpu: U(63,63) is zero, singular U.`)

I have a question regarding repeating trials. To my code I am not adding any new option compared to the Service API Example on Hartmann6 tutorial. However, I am changing the objective function and the parameters. Checking the results, I keep noticing that some runs produce lot of repeating trials. Am I missing something? Did it converge? If yes, how do I stop the repetition? By breaking the loop when the trial is identical to the previous one? Thanks.

for i in range(20):
    parameters, trial_index = ax.get_next_trial()
    ax.complete_trial(trial_index=trial_index, raw_data=evaluate(parameters))

Results: ax.get_trials_data_frame().sort_values('trial_index')

arm_name	MAE	trial_index	x1	x2	x3	x4
0	0_0	0.354344	0	3	29	56	a
3	1_0	0.392026	1	21	26	34	b
12	2_0	0.366922	2	15	88	67	a
13	3_0	0.395405	3	40	83	24	b
14	4_0	0.360699	4	7	8	60	a
15	5_0	0.36654		5	1	27	66	b
16	6_0	0.360878	6	1	35	47	b
17	7_0	0.354756	7	4	21	54	b
18	8_0	0.352988	8	4	29	55	b
19	9_0	0.355494	9	7	37	56	b
1	10_0	0.35465		10	5	31	54	b
2	11_0	0.366325	11	6	38	66	b
4	12_0	0.359888	12	5	27	57	b
5	13_0	0.351413	13	1	43	55	b
6	13_0	0.351413	14	1	43	55	b
7	13_0	0.351413	15	1	43	55	b
8	13_0	0.351413	16	1	43	55	b
9	13_0	0.351413	17	1	43	55	b
10	13_0	0.351413	18	1	43	55	b
11	13_0	0.351413	19	1	43	55	b

Note: x4 is not used in the function.

About this issue

Original URL
State: open
Created 5 years ago
Reactions: 1
Comments: 22 (9 by maintainers)

Most upvoted comments

We have some improved methods in the works for better treatment of integer parameters (in the next couple weeks), which should resolve this issue. cc: @Balandat

sdaulton on Sep 10, 2020

Update: we had a team discussion around this, and here are the outcomes:

We are working on making our modeling layer more robust to the numerical errors caused by data logged for many repeated trials and will update this issue when the fix for that is in.
We are also considering adding an option to specify experiment-level trial deduplication (currently we only offer deduplication for Sobol, but not for BayesOpt).
In the meantime, when encountering identical or very similar trials, you can skip over repeated trials and eventually use them as a stopping criterion, as described in my commend above:

A better alternative might be not to stop when you get a repeated trial, but continue getting new trials (without completing them) until you get a new one (with some limit, of course, at which you can just stop the whole optimization).

If standard errors on observations are known (or is the observations are known to be noiseless, in which case standard errors are 0.0), it’s better to actually pass those values to Ax, since in that case Ax will not need to infer noise level and numerical errors will be less likely to crop up.

cc @Balandat, @eytan, @bletham, @ldworkin

lena-kashtelyan on Dec 17, 2019

@lena-kashtelyan Thank you for pointing out that! However, in the above code snippet, what I need is a conditional generate strategy which falls back to random when BO generate seemingly identical trial, so I think an init param like that may not solve the problem here.

showgood163 on Dec 14, 2019

@covrig, the answer to whether your stopping logic will work for your use case is, unfortunately, “it depends.” If 1) we treat the observations obtained from trial evaluation as noiseless, and 2) you are okay with the risk of stopping at some local and not necessarily global optimal parameter values, then we can use that stopping logic.

However, if the observations are noisy and you can afford to continue running the optimization for some more trials, then the right thing to do would be to continue running more. It seems that at the moment that creates a case where once you end up with a lot of repeated trials, we start having numerical issues (NaNs encountered when trying to perform matrix-vector multiplication error). We are looking into how we can better avoid that.

A better alternative might be not to stop when you get a repeated trial, but continue getting new trials (without completing them) until you get a new one (with some limit, of course, at which you can just stop the whole optimization).

P.S.: If you can afford to just exhaust your search space or get close to doing so (which in your case you probably cannot, since you would be running half a million trials), it would be reasonable to just use Sobol generation strategy instead of Bayesian optimization –– I described how to do so in https://github.com/facebook/Ax/issues/230.

lena-kashtelyan on Dec 13, 2019

Hi @blenderben2 ! I don’t think there’s anything new that’s been shipped yet, but we should be able to help you figure out a workaround based on your particular use case – can you provide more details about what kind of optimization you’re running, and when you’re seeing this error?

Without knowing more, the best solution is probably what @lena-kashtelyan suggested above:

In the meantime, when encountering identical or very similar trials, you can skip over repeated trials and eventually use them as a stopping criterion.

In other words, if you’re hitting this error because we’re generating many repeated trials, it probably means the optimization is complete, and we’ve found the best point that we can.

ldworkin on Jan 4, 2021

@covrig, please open a separate issue for the large amount of Sobol steps question, since we do want these issues to be easily discoverable by others with similar questions!

Noted regarding the int-type documentation.

lena-kashtelyan on Dec 17, 2019

@covrig, re: keeping ints around for continuous ranges –– simply because in some cases folks need the parameters to take integer values only. We might reconsider if enough people find in confusing, though!

Re: Service API being faster –– that is unexpected. I would be curious to learn how you measured the runtimes there (since Loop API runs the evaluation function within the optimization and the Service API does not). If you’d like us to look into it, please open a separate issue with your code snippets. Thank you : )

Re: default API –– it really depends on the use case, there is no inherent reason to prefer one to another. Not sure if I can put it much better than our APIs doc.

@showgood163, gotcha! It seemed like you were doing something fancier than just forcing Sobol, but I just wanted to show the easier way of forcing it anyway, in case it comes in handy. Thank you for being a power user of Ax and providing us with helpful feedback!

lena-kashtelyan on Dec 16, 2019

@covrig, @winf-hsos: when defining a parameter like so, without providing value_type:

{
   "name": "x1",
   "type": "range",
   "bounds": [1, 40],
   # "value_type": "float", –– Need to specify this if using int bounds but want float values
},

the value type is inferred from the type of the bounds, which in your case is int, which makes your parameters defacto discrete. However, Bayesian optimization operates on continuous domains, and when the configurations it suggests are rounded to ints, different configurations end up the same, which @winf-hsos correctly explained.

I will be back shortly with a proposed solution to the issue!

P.S.: @showgood163, if you are just looking to use Sobol generation strategy instead of Bayesian optimization, you can pass choose_generation_strategy_kwargs={"no_bayesian_optimization": True} to AxClient.create_experiment, which will force the generation strategy for your optimization to be quasi-random.

lena-kashtelyan on Dec 13, 2019

It looks similar to an issue I just reported. My guess is that the arms are in fact only identical because your parameters are discrete and are therefore rounded to the next integer. Behind the scenes, BO uses real values and they are different for each arm.

I am also interested in a solution to this problem.

Thanks Nicolas

winf-hsos on Dec 12, 2019