DeepSpeed: [BUG]AttributeError: 'HfDeepSpeedConfig' object has no attribute 'trainer_config_finalize'
Describe the bug I check the source code, and find that HfDeepSpeedConfig is inherited by HfTrainerDeepSpeedConfig,the trainer_config_finalize is defined in the subclass HfTrainerDeepSpeedConfig.
Traceback (most recent call last): [10/1911] File “/mnt/data/wxc/workspace/release/qlora_train.py”, line 197, in <module> train() File “/mnt/data/wxc/workspace/release/qlora_train.py”, line 191, in train trainer.train() File “/home/wxc/miniconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py”, line 1661, in train return inner_training_loop( File “/home/wxc/miniconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py”, line 1740, in _inner_training_loop self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps) File “/home/wxc/miniconda3/envs/llama/lib/python3.9/site-packages/transformers/deepspeed.py”, line 343, in deepspeed_init hf_deepspeed_config.trainer_config_finalize(args, model, num_training_steps) AttributeError: ‘HfDeepSpeedConfig’ object has no attribute ‘trainer_config_finalize’
To Reproduce Steps to reproduce the behavior: 1.CUDA_VISIBLE_DEVICES=“0,5” accelerate launch --nnodes=1 --nproc_per_node=2 --master_port=‘29501’ qlora_train.py --learning_rate=2e-5 --per_device_train_batch_size=46 --gradient_accumulation_steps=1 --deepspeed deepspeed_config_s2.json
the deepspeed_config_s2.json `{ “fp16”: { “enabled”: “auto”, “loss_scale”: 0, “loss_scale_window”: 1000, “initial_scale_power”: 16, “hysteresis”: 2, “min_loss_scale”: 1 },
"optimizer": {
"type": "AdamW",
"params": {
"lr": "auto",
"betas": "auto",
"eps": "auto",
"weight_decay": "auto"
}
},
"scheduler": {
"type": "WarmupLR",
"params": {
"warmup_min_lr": "auto",
"warmup_max_lr": "auto",
"warmup_num_steps": "auto"
}
},
"zero_optimization": {
"stage": 2,
"offload_optimizer": {
"device": "cpu",
"pin_memory": true
},
"allgather_partitions": true,
"allgather_bucket_size": 2e8,
"overlap_comm": true,
"reduce_scatter": true,
"reduce_bucket_size": 2e8,
"contiguous_gradients": true
},
"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false
}`
Expected behavior I was able to run successful deepspeed multi-card training before, but after joining QLORA, I encountered this error.
System info (please complete the following information):
Because I ran into a problem before, a pr solved it ,the version I used was “git clone -b olruwase/ds_3678 https://github.com/microsoft/DeepSpeed.git”
Launcher context CUDA_VISIBLE_DEVICES=“0,5” accelerate launch --nnodes=1 --nproc_per_node=2 --master_port=‘29501’ qlora_train.py --learning_rate=2e-5 --per_device_train_batch_size=46 --gradient_accumulation_steps=1 --deepspeed deepspeed_config_s2.json
Docker context Are you using a specific docker image that you can share?
Additional context Add any other context about the problem here.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (6 by maintainers)
Removing any initialization of
accelerate.Accelerator
before initializing the HF trainer resolved this issue for me.For example, if you have used it when loading the model