trlx: [PPO, CUDA OOM] when running the trlx_gptj_text_summarization.py

🐛 Describe the bug

Hello.

I’m following examples/summarize_rlhf experiment and I’ve done until the step that training the reward model without any problem.

However, in PPO training step, when I run,

“accelerate launch --config_file configs/default_accelerate_config.yaml trlx_gptj_text_summarization.py”

it produces OutOfMemoryError: CUDA out of memory.

And the error is occurred in “rw_model.load_state_dict(torch.load(REWARD_CHECKPOINT_PATH))” line.

image

I’ve tried with A100 8 GPUs and at the error line reward models from 8 processes are loaded in GPU:0, and produce an OOM error. image

I attached “default_acclerate_config.yaml” for reference. Thank you.

command_file: null
commands: null
compute_environment: LOCAL_MACHINE
deepspeed_config:
  deepspeed_hostfile: localhost slots=8
  deepspeed_config_file: configs/ds_config_trlx_gptj_summarize.json
  zero3_init_flag: false
distributed_type: DEEPSPEED
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: null
machine_rank: 0
main_process_ip: null
main_process_port: 8855
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 1
num_processes: 8
rdzv_backend: static
same_network: true
tpu_name: null
tpu_zone: null
use_cpu: false

Which trlX version are you using?

trlx==0.4.0

Additional system and package information

3.8 (Ubuntu 18.04)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (3 by maintainers)

Most upvoted comments

@PhungVanDuy Yes, I’ve tried with the below code to load over the 20B model.

self.base_model = transformers.AutoModelForCausalLM.from_pretrained(config,
                                                                                device_map="auto",
                                                                                offload_folder="offload",
                                                                                offload_state_dict=True)

But didn’t work out,

Thanks.

Thanks for your quick reply, fully understood 😃

I will try to build an API for the reward model with triton. And may I ask when your next release will be?

Oh, I see!!

Thank you. I’ll be working on running on multi-GPU in the future.

I’ll close this issue.

Thank you 😃