stable-baselines: [bug] PPO2 episode reward summaries are written incorrectly for VecEnvs
Episode reward summaries are all concentrated together on a few steps, with jumps in between.
Zoomed out:

Zoomed in:

Every other summary looks fine:

To reproduce, run PPO2 on DummyVecEnv(["Pendulum-v0" for _ in range(8)]).
About this issue
- Original URL
- State: open
- Created 6 years ago
- Reactions: 4
- Comments: 16
Hi, I also encountered some issues described in the comments above. A recap follows.
PPO2 tensorboard visualization issues
If you run ppo2 with a single process training for 256 timesteps (
N=1,T=256) and try to visualize the episode reward and the optimization statistics:T(instead of being in [0,256], it is plotted in [256,512]) for the reason explained in https://github.com/hill-a/stable-baselines/issues/143#issuecomment-552952355timestepcalculations highlighted in https://github.com/hill-a/stable-baselines/issues/143#issuecomment-584530173Moreover, if you try to plot data using multiple processes (for instance
N=4workers withT=256timesteps per worker):Ttimesteps followed by a jump of(N-1)*Ttimesteps in the plotPPO tensorboard visualization proposed solution
I implemented the following solutions for the visualization issues:
Kepochs onN*T//Mminibatches (beingMthe training timesteps related to a minibatch), therefore a fixed number of data is collected during the optimization, namelyK * N*T//MK * N*T//Moptimization data are equally distributed over the batch sizeN*TAs a result, in the showcases above:
Nworkers are plotted side by sideThe modifications are just a few and straightforward. Regarding the side-by-side visualization of the rewards in the multiprocess case, do you believe that plotting the mean and variance of the collected data would instead be more appropriate?
If it is appreciated, I would open a PR with the implemented modifications, which I can update if the mean and variance solution is recommended.
@paolo-viceconte thanks, I’ll try to take a look at what you did this week (unless @Miffyli can do it before), we have too many issue related to that function (cf all linked issues).
@balintkozma
Thanks for the quick reply!
I think that could also be fixed in the same PR, as these two are relate-…
Ninj’d by Arrafin
Please do only one PR that solves this issue.