trl: PPO example not working with DeepSpeed Stage 3 or FSDP
I’ve been trying to get a PPO trainer to work with fully sharded training using either DeepSpeed stage 3 or FSDP. However, no matter what exact configuration options I try, I cannot get even the example in the documentation to work. It seems the problems are with calling trainer.generate() when sampling a rollout. With FSDP, it usually crashes, with the exact error message depending on exact accelerate config (e.g. https://github.com/pytorch/pytorch/issues/82461 ) With DeepSpeed, the script seems to just hang and time out, without an error message.
Is this known behavior, and is there a working example or documentation of PPO + Deepspeed/FSDP anywhere?
To reproduce, inside examples:
accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml scripts/ppo.py
or even accelerate launch --config_file accelerate_configs/deepspeed_zero3.yaml helloworld.py
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 4
- Comments: 15 (6 by maintainers)
Hello, I am facing the same error in deepspeed zero3,
AssertionError assert param.ds_status == ZeroParamStatus.AVAILABLE, inself.accelerator.backward(loss)ofPPOTrainer.train_minibatch. I suspect the error is related to the issues reported in https://github.com/microsoft/DeepSpeed/issues/4194 and https://github.com/microsoft/DeepSpeed/issues/4194, where using transformers==4.31.0 is reported to work. But for me downgrading transformers caused other issues, so I really hope to know other solutions if any.