sheeprl: dreamerv3 trouble resuming? freeze?
Hello, sorry I don’t actually know much about the mathematical formulas and whatnot behind rl and the algorithms, I’ve previously been just training stuff with SB3 PPO. Anyway, I installed sheeprl and implemented a wrapper for Stable Retro and got it started and I could see the envs running in parallel and the agent doing stuff. However, once it hits the point where its set to “learning_starts” it stops logging anything, although its still taking my CPU and RAM. It was basically sitting here for over 30 minutes with no logs. No idea if its actually doing anything or not, although I suppose I could try again with the visual retro window open so I could check.
Rank-0: policy_step=46848, reward_env_2=110.59998321533203
Rank-0: policy_step=50772, reward_env_3=278.8006286621094
Rank-0: policy_step=50904, reward_env_0=309.19964599609375
Rank-0: policy_step=53848, reward_env_1=126.399658203125
Rank-0: policy_step=55596, reward_env_0=268.4999694824219
Rank-0: policy_step=57212, reward_env_2=240.7994842529297
Rank-0: policy_step=59680, reward_env_3=195.09988403320312
Rank-0: policy_step=62880, reward_env_1=362.70013427734375
Rank-0: policy_step=64500, reward_env_0=532.301513671875
Any ideas as to what the issue could be?
*edit actually it finally updated, i guess its just really slow after learning starts? Is there a way to run this on GPU?
Rank-0: policy_step=64500, reward_env_0=532.301513671875
Rank-0: policy_step=66776, reward_env_2=365.80255126953125
Rank-0: policy_step=66808, reward_env_3=68.799560546875
About this issue
- Original URL
- State: closed
- Created 6 months ago
- Comments: 26 (8 by maintainers)
Cool thanks for the answers.
Could you quantify your being slow? Consider that even the smallest model (the S-sized one) used to train on the Atari-100k takes around 9/10 hours on a single V100, the same reported by the authors of DV3 with just a “< 1 day”. Another thing that can speed up the training is to set the
algo.train_every=NwithN>1, meaning that the agent will be trainedalgo.per_rank_gradient_stepstimes everyNpolicy steps, where a policy step is a single forward of the agent to retrieve the action given an observation from the env: if you haveEenvs per process andPprocesses, then for a single environment interaction you will haveE * Ppolicy stepsPS I also suggest to install the latest version of sheeprl with
pip install sheeprl==0.5.2, which faster and better optimizedThis is inteded because when we resume from a checkpoint we assume that somethig has gone wrong, so we load the old config from the checkpoint to be resumed and we merge it with the one running now, so everything that you specify from the CLI will be discarded.
Moreover, when you resume from a checkppoint you must (as explained in our how-to) specify the entire path to the file
.ckptcontaining the checkpoint to be resumed.When the checkpoint is resumed, the
start_stepwill be set to the last update saved in the checkpoint file, so that the training can be safely resume from there and if you haven’t saved the buffer in the checkpoint, then the algorithm will collect random data to fill up the new buffer forlearning_startssteps before starting the actual training.If you have used just like that it is possible that you’re running the experiment on cpu. To run it on gpu you can run the following command:
and to reduce the memory footprint you can also try to add
fabric.precision=bf16-mixedto train the model with a mixed precision. Could you please try this out and tell us if this solves your problem?I figured some ways around this stuff, only thing I’d like to really know is just if there is any way to reduce the memory since it takes so much vram while its training. otherwise ill just close this since its not really an issue.
Also since stable retro can only start one emulator per process, the training works due to the async_vector_env but when i run sheeprl_eval, the env fails because it tries to open two within the same process. Any way around this?