accelerate: Incorrect `num_warmup_steps` for `lr_scheduler` for multi-gpu training
System Info
- `Accelerate` version: 0.10.0
- Platform: Linux-3.10.0_3-0-0-12-x86_64-with-centos-6.3-Final
- Python version: 3.7.12
- Numpy version: 1.21.6
- PyTorch version (GPU?): 1.7.1 (True)
- `Accelerate` default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- main_process_ip: None
- main_process_port: None
- main_training_function: main
- deepspeed_config: {}
- fsdp_config: {}
Information
- The official example scripts
- My own modified scripts
Tasks
- One of the scripts in the examples/ folder of Accelerate or an officially supported
no_trainerscript in theexamplesfolder of thetransformersrepo (such asrun_no_trainer_glue.py) - My own task or dataset (give details below)
Reproduction
# define lr scheduler
lr_scheduler = get_scheduler(
name="linear",
optimizer=optimizer,
num_warmup_steps=args.warmup_steps,
num_training_steps=args.max_train_steps,
)
...
if step % args.gradient_accumulation_steps == 0:
optimizer.step()
lr_scheduler.step() # update lr scheduler every `gradient_accumulation_steps`
optimizer.zero_grad()
Expected behavior
Is the accelerate consider the num of processes for num_warmup_steps?
Suppose we set args.warmup_steps=80 and train on a single 8-gpu machine, the linear learning rate will peak at 10 (i.e., 80/8) rather than expected 80.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 19
Hello @cyk1337 , the link you have provided achieves
args.max_train_steps // num_gpusbecause it is steping fornum_processesper iteration, i.e., num_gpus times per iteration.I didn’t understand what the query was in case of not preparing
lr_scheduler. As per the original question, it is logical to have warmup steps to be reduced in a multi-device scenario.