trl: PPO example not working with DeepSpeed Stage 3 or FSDP

I’ve been trying to get a PPO trainer to work with fully sharded training using either DeepSpeed stage 3 or FSDP. However, no matter what exact configuration options I try, I cannot get even the example in the documentation to work. It seems the problems are with calling trainer.generate() when sampling a rollout. With FSDP, it usually crashes, with the exact error message depending on exact accelerate config (e.g. https://github.com/pytorch/pytorch/issues/82461 ) With DeepSpeed, the script seems to just hang and time out, without an error message.

Is this known behavior, and is there a working example or documentation of PPO + Deepspeed/FSDP anywhere?

To reproduce, inside examples: accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/ppo.py or even accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml helloworld.py

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Reactions: 4
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Hello, I am facing the same error in deepspeed zero3, AssertionError assert param.ds_status == ZeroParamStatus.AVAILABLE, in self.accelerator.backward(loss) of PPOTrainer.train_minibatch. I suspect the error is related to the issues reported in https://github.com/microsoft/DeepSpeed/issues/4194 and https://github.com/microsoft/DeepSpeed/issues/4194, where using transformers==4.31.0 is reported to work. But for me downgrading transformers caused other issues, so I really hope to know other solutions if any.