ray: [rllib] MeanStdFilter value problem during compute action
System information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Ubuntu 16.04
- Ray installed from (source or binary): source
- Ray version: 0.5.0
- Python version: 3.6.1
- Exact command to reproduce:
Describe the problem
When I run agents have been trained on for evaluation, I found that the agent doesn’t work well like the rewards I’ve monitored.
And I’ve found that when I restore agent, the filter(MeanStdFilter) has somewhat strange values.
Have you ever heard of such a problem?
Source code / logs
agent = cls(env="prosthetics", config=config)
agent.restore(checkpoint_path)
print(agent.local_evaluator.filters)
>>> {'default': MeanStdFilter((158,), True, True, None, (n=128033777, mean_mean=-1.0975956375310988e+181, mean_std=inf), (n=0, mean_mean=0.0, mean_std=0.0))}
About this issue
- Original URL
- State: closed
- Created 6 years ago
- Comments: 22 (3 by maintainers)
@whikwon @ericl Thanks for all the guidance. So here is what I’ve found: The environment is Pendulum-v0. In agent.compute_action, I log the values of obs before and after the filter. As you will see below, the filtered values are either extremely small or large.
So it seems like the filter is not applied correctly in the code below:
You can probably add a print() to determine what update causes it to reach an inf value.