stable-baselines3-contrib: PPORecurrent mini batch size inconsistent
I am using PPORecurrent with the RecurrentDictRolloutBuffer. In https://github.com/Stable-Baselines-Team/stable-baselines3-contrib/issues/103 it’s mentioned that batch size is intended to be constant size. However this seems not to be the case.
I did some experiments where I print the action batch sizes coming from the _get_samples() function.
Exp 1: batch_size < sequence length
Initialize PPORecurrent with batch_size = 10. I am sampling only 2000 steps for debugging purposes. The mean sequence length is 32.3

It looks like whenever n_seq = 2 and it appends 2 different sequences the batch size is higher than the specified 10.
Exp 2: batch_size > sequence length
Initialize PPORecurrent with batch_size = 100 The mean sequence length this time is 31.7

The batch size is now always higher than the specified 100, and different every time.
Is this the intended behavior? It seems wrong to me since it is stated explicitly in the aforementioned issue that batch size is intended to be constant:
Actually no, the main reason is that you want to keep a mini batch size constant (otherwise you will need to adjust the learning rate for instance).
Also, now that we are at it, what is the reason for implementing mini batching, instead of feeding batches of whole sequences to the model?
System Info Describe the characteristic of your environment:
- installed with pip
- Stable-Baselines3 1.6.0
- sb3-contrib 1.6.0
- GPU models and configuration
- Python 3.8.0
- PyTorch 1.12.1+cu116
- Gym 0.21.0
About this issue
- Original URL
- State: open
- Created 2 years ago
- Comments: 18 (9 by maintainers)
I integrated the code better with the existing code and removed the duplicates. Now about 120 lines smaller. Enabling it is still optional through the
whole_sequenceflag.Curious what you think regarding the flatten layer.
It also seems to me that we could simplify the
_process_sequencefunction if we make whole sequences the default. I believe_process_sequencewill only have to deal with single step sequences now that the forward passes are dealt with inevaluate_actions_whole_sequence.Of course! Here it is: #118 Let me know if you want to see any changes.