sheeprl: dreamerv3 trouble resuming? freeze?

Hello, sorry I don’t actually know much about the mathematical formulas and whatnot behind rl and the algorithms, I’ve previously been just training stuff with SB3 PPO. Anyway, I installed sheeprl and implemented a wrapper for Stable Retro and got it started and I could see the envs running in parallel and the agent doing stuff. However, once it hits the point where its set to “learning_starts” it stops logging anything, although its still taking my CPU and RAM. It was basically sitting here for over 30 minutes with no logs. No idea if its actually doing anything or not, although I suppose I could try again with the visual retro window open so I could check.

Rank-0: policy_step=46848, reward_env_2=110.59998321533203
Rank-0: policy_step=50772, reward_env_3=278.8006286621094
Rank-0: policy_step=50904, reward_env_0=309.19964599609375
Rank-0: policy_step=53848, reward_env_1=126.399658203125
Rank-0: policy_step=55596, reward_env_0=268.4999694824219
Rank-0: policy_step=57212, reward_env_2=240.7994842529297
Rank-0: policy_step=59680, reward_env_3=195.09988403320312
Rank-0: policy_step=62880, reward_env_1=362.70013427734375
Rank-0: policy_step=64500, reward_env_0=532.301513671875

Any ideas as to what the issue could be?

*edit actually it finally updated, i guess its just really slow after learning starts? Is there a way to run this on GPU?

Rank-0: policy_step=64500, reward_env_0=532.301513671875
Rank-0: policy_step=66776, reward_env_2=365.80255126953125
Rank-0: policy_step=66808, reward_env_3=68.799560546875

About this issue

Original URL
State: closed
Created 6 months ago
Comments: 26 (8 by maintainers)

Most upvoted comments

Cool thanks for the answers.

Disastorm on Jan 14, 2024

fabric.accelerator=gpu on command line.

I think this is an indicator that I have done something wrong in my setup. fabric.accelerator=gpu doesn’t work for me, I have to do fabric.accelerator=cuda . I did notice that eventually it did start going again (like yours) but veeeeeeeeeeery slow. original tf dreamer implementation runs about 10x faster, to say nothing of the JAX dreamerv3 implementation (but im not trying to compare against that, apples to oranges, that code is lowkey impossible to work with)

Could you quantify your being slow? Consider that even the smallest model (the S-sized one) used to train on the Atari-100k takes around 9/10 hours on a single V100, the same reported by the authors of DV3 with just a “< 1 day”. Another thing that can speed up the training is to set the algo.train_every=N with N>1, meaning that the agent will be trained algo.per_rank_gradient_steps times every N policy steps, where a policy step is a single forward of the agent to retrieve the action given an observation from the env: if you have E envs per process and P processes, then for a single environment interaction you will have E * P policy steps

belerico on Jan 13, 2024

PS I also suggest to install the latest version of sheeprl with pip install sheeprl==0.5.2, which faster and better optimized

belerico on Jan 13, 2024

looks like if i modify the values in the old yaml, it works. i change the total_steps and learning_starts in the old yaml and when resuming the resuming gets those values properly.

This is inteded because when we resume from a checkpoint we assume that somethig has gone wrong, so we load the old config from the checkpoint to be resumed and we merge it with the one running now, so everything that you specify from the CLI will be discarded.

Moreover, when you resume from a checkppoint you must (as explained in our how-to) specify the entire path to the file .ckpt containing the checkpoint to be resumed.

When the checkpoint is resumed, the start_step will be set to the last update saved in the checkpoint file, so that the training can be safely resume from there and if you haven’t saved the buffer in the checkpoint, then the algorithm will collect random data to fill up the new buffer for learning_starts steps before starting the actual training.

belerico on Jan 13, 2024

I feel like you should leave this open, I am encountering the same error with command: python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk "algo.cnn_keys.encoder=[rgb]" Hangs at 64k. I’m, 99% sure this isn’t a vram problem on my end (A40 barely showed fill) what was your fix?

If you have used just like that it is possible that you’re running the experiment on cpu. To run it on gpu you can run the following command:

python sheeprl.py exp=dreamer_v3 env=dmc env.id=walker_walk "algo.cnn_keys.encoder=[rgb] fabric.accelerator=gpu

and to reduce the memory footprint you can also try to add fabric.precision=bf16-mixed to train the model with a mixed precision. Could you please try this out and tell us if this solves your problem?

belerico on Jan 13, 2024

I figured some ways around this stuff, only thing I’d like to really know is just if there is any way to reduce the memory since it takes so much vram while its training. otherwise ill just close this since its not really an issue.

Disastorm on Jan 12, 2024

Also since stable retro can only start one emulator per process, the training works due to the async_vector_env but when i run sheeprl_eval, the env fails because it tries to open two within the same process. Any way around this?

Disastorm on Jan 12, 2024