stable-baselines3-contrib: PPORecurrent mini batch size inconsistent

I am using PPORecurrent with the RecurrentDictRolloutBuffer. In https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/103 it’s mentioned that batch size is intended to be constant size. However this seems not to be the case.

I did some experiments where I print the action batch sizes coming from the _get_samples() function.

Exp 1: batch_size < sequence length

Initialize PPORecurrent with batch_size = 10. I am sampling only 2000 steps for debugging purposes. The mean sequence length is 32.3

It looks like whenever n_seq = 2 and it appends 2 different sequences the batch size is higher than the specified 10.

Exp 2: batch_size > sequence length

Initialize PPORecurrent with batch_size = 100 The mean sequence length this time is 31.7

The batch size is now always higher than the specified 100, and different every time.

Is this the intended behavior? It seems wrong to me since it is stated explicitly in the aforementioned issue that batch size is intended to be constant:

Actually no, the main reason is that you want to keep a mini batch size constant (otherwise you will need to adjust the learning rate for instance).

Also, now that we are at it, what is the reason for implementing mini batching, instead of feeding batches of whole sequences to the model?

System Info Describe the characteristic of your environment:

installed with pip
Stable-Baselines3 1.6.0
sb3-contrib 1.6.0
GPU models and configuration
Python 3.8.0
PyTorch 1.12.1+cu116
Gym 0.21.0

About this issue

Original URL
State: open
Created 2 years ago
Comments: 18 (9 by maintainers)

Most upvoted comments

I integrated the code better with the existing code and removed the duplicates. Now about 120 lines smaller. Enabling it is still optional through the whole_sequence flag.

Curious what you think regarding the flatten layer.

It also seems to me that we could simplify the _process_sequence function if we make whole sequences the default. I believe _process_sequence will only have to deal with single step sequences now that the forward passes are dealt with in evaluate_actions_whole_sequence.

b-vm on Mar 1, 2023

Of course! Here it is: #118 Let me know if you want to see any changes.

b-vm on Nov 28, 2022