pomdp-py: Moving a problem from SARSOP to POMCP
Hello! I am currently trying to use pomdp_py to explore using POMDP as a framework to help decision-making in Brain-Computer Interfaces (BCI). A while back I asked about the possibility of defining a model that included a time component #21.
TL;DR - These are the problems/questions I have:
Particlesis not constructed correctly when passed a list of tuples generated from theHistogrambelief representation- I am not sure that the way I worked around the previous problem is correct
- Even if it is, when I try to print the current belief (which should be a bunch of 0s, and only a handful of values other than 0), I get what I believe is the list of particles.
- Technically not a question, but I also think this would be a useful piece of info to include in the Tiger problem’s documentation
- Also, thank you very much for this fantastic library. It is awesome and has made my life much easier in the last year.
After finishing a first work exploring BCIs using a simple POMDP that relied on SARSOP for solving. I am now trying to move to POMCP to solve a problem similar to what we discussed in #21.
I ran into a problem when trying to call pomcp.plan() in POMCP as I was initializing my belief like the following:
import pomdp_py
import itertools
n_targets = 12 # For example
n_steps = 8 # Number of time-steps before the true state changes
# Full state space
all_states = [TDState(state_id, t_step) for state_id, time_step
in itertools.product(range(n_targets), range(n_steps))]
# States at time-step 0 only
all_init_states = [TDState(state_id, 0) for state_id in n_targets]
# Initialize belief with a uniform probability over n_targets for all states at time-step 0
init_belief = pomdp_py.Histogram({state: 1 / n_targets if state in all_init_states
else 0 for state in all_states})
I got then an error saying that the belief needs to be represented as particles in order to use POMCP. I notices this is not present in the Tiger’s example in the documentation. After looking around for a bit, I found the Particles module. However, I do not understand how to construct it. I tried to do it in the following way (starting from the previous init_belief)
belief_dict = init_belief.get_histogram()
belief_tuples = [(value, weight) for value, weight in init_belief]
init_belief_particles = pomdp_py.Particles(belief_tuples)
But then I get the entire tuples (both State and its probability) as values, and all weights are set to None. I ended up making it like the following:
init_belief_particles = pomdp_py.Particles([]).from_histogram(init_belief)
That way it looks like it works. However, when I start the simulation I have the following piece of code where I print the current belief (right now as it was for the histogram representation of the belief):
for trial_n in range(n_trials):
# Separate next trial and label
next_trial = X_test[trial_n, :, :]
next_label = int(y_test[trial_n])
# Set belief to uniform, in case last trial ended without a decision
vep_problem.agent.set_belief(init_belief_particles)
# Set the true state as the env state (time step = 0)
true_state = TDState(next_label, 0)
vep_problem.env.apply_transition(true_state)
print('')
print(f'TRIAL {trial_n} (true state {true_state})')
print('-' * 20)
# For every time step...
for step_n in range(n_steps):
cur_belief = [vep_problem.agent.cur_belief[st] for st in vep_problem.agent.cur_belief]
print('')
print(f' STEP {step_n}')
print(' Current belief:')
print(f' {cur_belief}')
And then I get a very long list where none of the values is 0. At any time step, only n_targets states should be different than 0, as the initial belief is specified as stated previously and the transition model (not shown) specifies that all states can only go to the same state with an increase in time_step. That said, I don’t know if what I am getting is the belief or rather the list of particles (my understanding of how the particle representation works is limited, even after reading the POMCP paper, so it may be that). If that is the case, I wonder if it is correct calling the belief with something like problem.agent.cur_belief.get_histogram().
Thank you very much for your time.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16 (8 by maintainers)
To ensure independence between trials, I would re-create a problem instance at the start of a trial.
Well, I can’t guarantee you to not run in to any problem. You should be able to use Particles to represent the belief state in your case.
This may not give you an accurate summary of your distribution, because your belief is sparse and many states are probably not added as a particle. This function only produces an equivalent distribution that is more compact.
I’d say to be accurate, you have to do:
Thanks. Your contribution is welcome. Please have a look at: https://github.com/h2r/pomdp-py/issues/25