trlx: [PPO, CUDA OOM] when running the trlx_gptj_text_summarization.py
🐛 Describe the bug
Hello.
I’m following examples/summarize_rlhf experiment and I’ve done until the step that training the reward model without any problem.
However, in PPO training step, when I run,
“accelerate launch --config_file configs/default_accelerate_config.yaml trlx_gptj_text_summarization.py”
it produces OutOfMemoryError: CUDA out of memory.
And the error is occurred in “rw_model.load_state_dict(torch.load(REWARD_CHECKPOINT_PATH))” line.

I’ve tried with A100 8 GPUs and at the error line reward models from 8 processes are loaded in GPU:0, and produce an OOM error.
I attached “default_acclerate_config.yaml” for reference. Thank you.
command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
deepspeed_hostfile: localhost slots=8
deepspeed_config_file: configs/ds_config_trlx_gptj_summarize.json
zero3_init_flag: false
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: 8855
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false
Which trlX version are you using?
trlx==0.4.0
Additional system and package information
3.8 (Ubuntu 18.04)
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 17 (3 by maintainers)
@PhungVanDuy Yes, I’ve tried with the below code to load over the 20B model.
But didn’t work out,
Thanks.
Thanks for your quick reply, fully understood 😃
I will try to build an API for the reward model with triton. And may I ask when your next release will be?
Oh, I see!!
Thank you. I’ll be working on running on multi-GPU in the future.
I’ll close this issue.
Thank you 😃