stable-baselines3: SubprocEnv does not call `reset()` when expected and `set_attr` behaves weirdly
š Bug
I have a custom gym.Env (which passes all check_env
) steps. Having started this project months ago, I am running on Gym 0.21.0 and SB3 1.8.0.
My custom env has an attribute (device_freeze
), which is used in the reset()
method to trigger a change in the environment dynamics. The goal is to have a callback which every n_timesteps
interact with my custom env and sets device_freeze
to False.
In this way, during the following call of the reset()
method, the environmentās dynamics are changed. Please note that changing the dynamics at the end of an episodeāand not during the episodeāis more than a whim: changing the environment dynamics within the episode goes against the MDP formulation that I am considering at the moment (or at least thatās what I believe, open on theoretical feedback on this).
Hereās the source code of the custom env (inheriting from another custom env which inherits from a BaseEvn subclassing gym.Envāit has been a long project ahahaha):
import numpy as np
from commons import BaseInterface
from .oscar import OscarEnv
from typing import Iterable, Text, Dict
from numpy.typing import NDArray
class MarcellaEnv(OscarEnv):
def __init__(self,
[..., parent class args])
devices_and_bounds:Dict[Text, float]={"device1": 50., "device2": 6., "device2": 7.}):
"""
This argument stores the list of devices used for multi-tasking training and the respective performance bounds.
"""
self.device_freeze = True # here the problematic argument
self.devices_and_bounds = devices_and_bounds
super().__init__(
[...parent args]
)
@property
def name(self):
return "marcella"
def change_device(self):
"""
Change the target device based on random selection if device freeze is not enabled.
Returns:
None
Note:
The method randomly selects a new device from the available devices and updates the target
device and upper accordingly. This only happens if device freeze is not enabled.
"""
if not self.device_freeze:
new_device = np.random.choice(list(self.devices_and_bounds.keys()))
self.target_device = new_device
self.new_bound = self.devices_and_bounds[new_device]
# entered the loop because device freeze was False, switch sets it to True
self.device_freeze = True
def reset(self)->NDArray:
"""Resets custom env attributes."""
self._observation = self.observation_space.sample()
self.change_device()
self.update_current_net()
self.timestep_counter= 0
return self._get_obs()
The following is instead the callback code:
"""Custom callbacks to be used during training to record the learnign process."""
from stable_baselines3.common.callbacks import BaseCallback
import numpy as np
from stable_baselines3.common.vec_env import VecEnv
from typing import Text, Iterable
class MultiTask_Callback(BaseCallback):
"""Custom callback inheriting from `BaseCallback`.
:param verbose: (int) Verbosity level 0: not output 1: info 2: debug.
Performs various actions when triggered (intended to be a child of EventCallback):
1. Evaluates current policy (for n_eval_episodes)
2. Updates a current best_policy variable
3. Logs stuff on wandb. More details on what is logged in :meth:_on_step.
"""
def __init__(
self,
verbose:int=0):
"""Init function defines callback context."""
super().__init__(verbose)
self.devices_history = []
def _on_step(self) -> bool:
"""
This method will be called by the model after each call to `_env.step()`.
For child callback (of an `EventCallback`), this will be called
when the event is triggered.
:return: (bool) If the callback returns False, training is aborted early.
"""
# storing the current hardware used for training
current_device = self.model.env.get_attr("target_device")
# stores the target hardware the model has been currently training on
self.devices_history.append(current_device)
print(self.model.env.get_attr("device_freeze"), self.model.env.get_attr("target_device"))
# flips the switch that prevents different devices to be chosen at episode init
self.model.env.set_attr("device_freeze", False)
return True
def get_devices_history(self):
"""Returns the full history of hardware devices"""
return self.devices_history
When using DummyVecEnv
or SubprocEnv
I obtain the following std-output from the callback execution within a training scipt:
[True] ['device1']
[False] ['device1']
[False] ['device1']
[False] ['device1']
[False] ['device1']
[False] ['device1']
...
Which is not plausible, since the print is triggered every 30 timesteps and the maximal number of timesteps is set to 50. However, if I modify the callback code in the following way:
self.model.env.set_attr("device_freeze", False)
becomes:
for env_idx in range(self.model.env.num_envs):
# manually changing, since setattr appears to be useless here
self.model.env.envs[env_idx].unwrapped.device_freeze = False
Then the output I obtain is:
[True] ['device1']
[False] ['device1']
[True] ['device2']
[True] ['device3']
[True] ['device3']
[False] ['device3']
[True] ['device1']
[True] ['device2']
[False] ['device2']
[True] ['device3']
[False]['device3']
My issue is that when this āforcefulā fix would clearly not work for SubprocEnv (Iāve already tried) since SubprocEnv does not allow to iterate across envs.
What is very weird here is the fact that setattr
does not change the value of the device_freeze
but once. Then, it looks like it is ignored even tho the callback _on_step()
method is executed (one printed line per call).
Would really appreciate some help in figuring out why this!
Code example
Code in issue message.
Relevant log output / Error message
AttributeError: SubprocEnv does not have the envs attribute
System Info
Libraries installed via pip.
- OS: Linux-5.13.0-52-generic-x86_64-with-glibc2.31 # 59~20.04.1-Ubuntu SMP Thu Jun 16 21:21:28 UTC 2022
- Python: 3.10.8
- Stable-Baselines3: 1.8.0
- PyTorch: 2.0.1+cu117
- GPU Enabled: True
- Numpy: 1.24.3
- Gym: 0.21.0
Checklist
- I have checked that there is no similar issue in the repo
- I have read the documentation
- I have provided a minimal working example to reproduce the bug
- I have checked my env using the env checker
- Iāve used the markdown code blocks for both code and stack traces.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 20 (12 by maintainers)
If you want to submit this improvement in the documentation, I encourage you to open a PR š