DeepSpeed: [BUG]AttributeError: 'HfDeepSpeedConfig' object has no attribute 'trainer_config_finalize'

Describe the bug I check the source code, and find that HfDeepSpeedConfig is inherited by HfTrainerDeepSpeedConfig,the trainer_config_finalize is defined in the subclass HfTrainerDeepSpeedConfig.

Traceback (most recent call last): [10/1911] File “/mnt/data/wxc/workspace/release/qlora_train.py”, line 197, in <module> train() File “/mnt/data/wxc/workspace/release/qlora_train.py”, line 191, in train trainer.train() File “/home/wxc/miniconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py”, line 1661, in train return inner_training_loop( File “/home/wxc/miniconda3/envs/llama/lib/python3.9/site-packages/transformers/trainer.py”, line 1740, in _inner_training_loop self.optimizer, self.lr_scheduler = deepspeed_init(self, num_training_steps=max_steps) File “/home/wxc/miniconda3/envs/llama/lib/python3.9/site-packages/transformers/deepspeed.py”, line 343, in deepspeed_init hf_deepspeed_config.trainer_config_finalize(args, model, num_training_steps) AttributeError: ‘HfDeepSpeedConfig’ object has no attribute ‘trainer_config_finalize’

To Reproduce Steps to reproduce the behavior: 1.CUDA_VISIBLE_DEVICES=“0,5” accelerate launch --nnodes=1 --nproc_per_node=2 --master_port=‘29501’ qlora_train.py --learning_rate=2e-5 --per_device_train_batch_size=46 --gradient_accumulation_steps=1 --deepspeed deepspeed_config_s2.json

the deepspeed_config_s2.json `{ “fp16”: { “enabled”: “auto”, “loss_scale”: 0, “loss_scale_window”: 1000, “initial_scale_power”: 16, “hysteresis”: 2, “min_loss_scale”: 1 },

"optimizer": {
    "type": "AdamW",
    "params": {
        "lr": "auto",
        "betas": "auto",
        "eps": "auto",
        "weight_decay": "auto"
    }
},

"scheduler": {
    "type": "WarmupLR",
    "params": {
        "warmup_min_lr": "auto",
        "warmup_max_lr": "auto",
        "warmup_num_steps": "auto"
    }
},

"zero_optimization": {
    "stage": 2,
    "offload_optimizer": {
        "device": "cpu",
        "pin_memory": true
    },
    "allgather_partitions": true,
    "allgather_bucket_size": 2e8,
    "overlap_comm": true,
    "reduce_scatter": true,
    "reduce_bucket_size": 2e8,
    "contiguous_gradients": true
},

"gradient_accumulation_steps": "auto",
"gradient_clipping": "auto",
"steps_per_print": 2000,
"train_batch_size": "auto",
"train_micro_batch_size_per_gpu": "auto",
"wall_clock_breakdown": false

}`

Expected behavior I was able to run successful deepspeed multi-card training before, but after joining QLORA, I encountered this error.

System info (please complete the following information):

Because I ran into a problem before, a pr solved it ,the version I used was “git clone -b olruwase/ds_3678 https://github.com/microsoft/DeepSpeed.git

Launcher context CUDA_VISIBLE_DEVICES=“0,5” accelerate launch --nnodes=1 --nproc_per_node=2 --master_port=‘29501’ qlora_train.py --learning_rate=2e-5 --per_device_train_batch_size=46 --gradient_accumulation_steps=1 --deepspeed deepspeed_config_s2.json

Docker context Are you using a specific docker image that you can share?

Additional context Add any other context about the problem here.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (6 by maintainers)

Most upvoted comments

Removing any initialization of accelerate.Accelerator before initializing the HF trainer resolved this issue for me.

For example, if you have used it when loading the model

model = AutoModelForCausalLM.from_pretrained(
    model_name_or_path,
    torch_dtype=torch.fp16,
-    device_map={"": Accelerator().process_index},
)