Ax: [Sobol fallback needed] Repeated trials in experiment (and numerical errors they sometimes cause: `RuntimeError: cholesky_cpu: U(63,63) is zero, singular U.`)
I have a question regarding repeating trials. To my code I am not adding any new option compared to the Service API Example on Hartmann6 tutorial. However, I am changing the objective function and the parameters. Checking the results, I keep noticing that some runs produce lot of repeating trials. Am I missing something? Did it converge? If yes, how do I stop the repetition? By breaking the loop when the trial is identical to the previous one? Thanks.
for i in range(20):
parameters, trial_index = ax.get_next_trial()
ax.complete_trial(trial_index=trial_index, raw_data=evaluate(parameters))
Results:
ax.get_trials_data_frame().sort_values('trial_index')
arm_name MAE trial_index x1 x2 x3 x4
0 0_0 0.354344 0 3 29 56 a
3 1_0 0.392026 1 21 26 34 b
12 2_0 0.366922 2 15 88 67 a
13 3_0 0.395405 3 40 83 24 b
14 4_0 0.360699 4 7 8 60 a
15 5_0 0.36654 5 1 27 66 b
16 6_0 0.360878 6 1 35 47 b
17 7_0 0.354756 7 4 21 54 b
18 8_0 0.352988 8 4 29 55 b
19 9_0 0.355494 9 7 37 56 b
1 10_0 0.35465 10 5 31 54 b
2 11_0 0.366325 11 6 38 66 b
4 12_0 0.359888 12 5 27 57 b
5 13_0 0.351413 13 1 43 55 b
6 13_0 0.351413 14 1 43 55 b
7 13_0 0.351413 15 1 43 55 b
8 13_0 0.351413 16 1 43 55 b
9 13_0 0.351413 17 1 43 55 b
10 13_0 0.351413 18 1 43 55 b
11 13_0 0.351413 19 1 43 55 b
Note: x4
is not used in the function.
About this issue
- Original URL
- State: open
- Created 5 years ago
- Reactions: 1
- Comments: 22 (9 by maintainers)
We have some improved methods in the works for better treatment of integer parameters (in the next couple weeks), which should resolve this issue. cc: @Balandat
Update: we had a team discussion around this, and here are the outcomes:
cc @Balandat, @eytan, @bletham, @ldworkin
@lena-kashtelyan Thank you for pointing out that! However, in the above code snippet, what I need is a conditional generate strategy which falls back to random when BO generate seemingly identical trial, so I think an init param like that may not solve the problem here.
@covrig, the answer to whether your stopping logic will work for your use case is, unfortunately, “it depends.” If 1) we treat the observations obtained from trial evaluation as noiseless, and 2) you are okay with the risk of stopping at some local and not necessarily global optimal parameter values, then we can use that stopping logic.
However, if the observations are noisy and you can afford to continue running the optimization for some more trials, then the right thing to do would be to continue running more. It seems that at the moment that creates a case where once you end up with a lot of repeated trials, we start having numerical issues (
NaNs encountered when trying to perform matrix-vector multiplication
error). We are looking into how we can better avoid that.A better alternative might be not to stop when you get a repeated trial, but continue getting new trials (without completing them) until you get a new one (with some limit, of course, at which you can just stop the whole optimization).
P.S.: If you can afford to just exhaust your search space or get close to doing so (which in your case you probably cannot, since you would be running half a million trials), it would be reasonable to just use Sobol generation strategy instead of Bayesian optimization –– I described how to do so in https://github.com/facebook/Ax/issues/230.
Hi @blenderben2 ! I don’t think there’s anything new that’s been shipped yet, but we should be able to help you figure out a workaround based on your particular use case – can you provide more details about what kind of optimization you’re running, and when you’re seeing this error?
Without knowing more, the best solution is probably what @lena-kashtelyan suggested above:
In other words, if you’re hitting this error because we’re generating many repeated trials, it probably means the optimization is complete, and we’ve found the best point that we can.
@covrig, please open a separate issue for the large amount of Sobol steps question, since we do want these issues to be easily discoverable by others with similar questions!
Noted regarding the int-type documentation.
@covrig, re: keeping ints around for continuous ranges –– simply because in some cases folks need the parameters to take integer values only. We might reconsider if enough people find in confusing, though!
Re: Service API being faster –– that is unexpected. I would be curious to learn how you measured the runtimes there (since Loop API runs the evaluation function within the optimization and the Service API does not). If you’d like us to look into it, please open a separate issue with your code snippets. Thank you : )
Re: default API –– it really depends on the use case, there is no inherent reason to prefer one to another. Not sure if I can put it much better than our APIs doc.
@showgood163, gotcha! It seemed like you were doing something fancier than just forcing Sobol, but I just wanted to show the easier way of forcing it anyway, in case it comes in handy. Thank you for being a power user of Ax and providing us with helpful feedback!
@covrig, @winf-hsos: when defining a parameter like so, without providing
value_type
:the value type is inferred from the type of the bounds, which in your case is int, which makes your parameters defacto discrete. However, Bayesian optimization operates on continuous domains, and when the configurations it suggests are rounded to ints, different configurations end up the same, which @winf-hsos correctly explained.
I will be back shortly with a proposed solution to the issue!
P.S.: @showgood163, if you are just looking to use Sobol generation strategy instead of Bayesian optimization, you can pass
choose_generation_strategy_kwargs={"no_bayesian_optimization": True}
toAxClient.create_experiment
, which will force the generation strategy for your optimization to be quasi-random.It looks similar to an issue I just reported. My guess is that the arms are in fact only identical because your parameters are discrete and are therefore rounded to the next integer. Behind the scenes, BO uses real values and they are different for each arm.
I am also interested in a solution to this problem.
Thanks Nicolas