FastChat: NotImplementedError: Cannot copy out of meta tensor; no data!

Hello,I used Qlora to train,but I get an error: NotImplementedError: Cannot copy out of meta tensor; no data!

requriements.txt:

peft @ file:///root/peft
torch==1.13.1+cu116
torchaudio==0.13.1+cu116
torchvision==0.14.1+cu116
transformers==4.28.1
deepspeed==0.9.4
flash-attn==0.2.0

This my train code:

CUDA_VISIBLE_DEVICES=0 deepspeed fastchat/train/train_lora.py \
    --model_name_or_path ../vicuna-7b  \
    --lora_r 8 \
    --lora_alpha 16 \
    --lora_dropout 0.05 \
    --data_path ./data/dummy_conversation.json  \
    --bf16 True \
    --output_dir ./checkpoints \
    --num_train_epochs 3 \
    --per_device_train_batch_size 4 \
    --per_device_eval_batch_size 4 \
    --gradient_accumulation_steps 1 \
    --evaluation_strategy "no" \
    --save_strategy "steps" \
    --save_steps 1200 \
    --save_total_limit 100 \
    --learning_rate 2e-5 \
    --weight_decay 0. \
    --warmup_ratio 0.03 \
    --lr_scheduler_type "cosine" \
    --logging_steps 1 \
    --tf32 True \
    --model_max_length 2048 \
    --q_lora True \
    --deepspeed playground/deepspeed_config_s2.json 

Can you help me,please? @merrymercy

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 18 (3 by maintainers)

Most upvoted comments

To fix the issue with the “found optimizer but no scheduler”, simply remove the optimizer from the deepspeed config. This was a new change with the new version of transformers (4.30.2). Check out this issue for more information regarding supported combinations.

I believe that the old transformers version (4.28.1) I was using also faced the OOM issues that you guys are getting. After installing (4.30.2), this was resolved with LLaMA-7B taking about 5GB of VRAM and fine-tuning with LoRA taking about 10GB of VRAM.

@kanslor I add this to deepspeed.config file “scheduler”:{ “type”:“WarmupLR”, “params”: {“warmup_min_lr”:“auto”, “warmup_max_lr”:“auto”, “warmup_num_steps”:“auto” } } And I also faced OOM error, so I give up using official fine-tune method. I use this git repo https://github.com/hiyouga/LLaMA-Efficient-Tuning/tree/main#ppo-training-rlhf to successfully fine-tune Vicuna7B. This is my lora weight. It only needs almost 16 GPU memory to fine-tune. image

This is my command line to Fine-Tune by using new git repo.

CUDA_VISIBLE_DEVICES=0 python src/train_sft.py
–model_name_or_path your_model_path
–do_train
–dev_ratio 0.1
–dataset dummy_conversation_for_LETM
–finetuning_type lora
–output_dir output_path_you_want
–overwrite_cache
–per_device_train_batch_size 4
–gradient_accumulation_steps 4
–lr_scheduler_type cosine
–logging_steps 10
–save_steps 1000
–learning_rate 5e-5
–num_train_epochs 3.0
–plot_loss
–fp16

Web Demo

python src/cli_demo.py
–model_name_or_path vicuna_model_path
–checkpoint_dir fine_tuned_lora_path

python src/web_demo.py
–model_name_or_path vicuna_model_path
–checkpoint_dir fine_tuned_lora_path

Combine Vicuna7B and lora weight/ Export model

python src/export_model.py
–model_name_or_path vicuna_model_path
–checkpoint_dir fine_tuned_lora_path
–output_dir combine_model_path

My deepspeed config and training script is the same as listed here

@BabyChouSr I am still getting the OOM for GPU, even after removing both optimizer and scheduler configs. Transformers’ version is 4.30.2 and peft is 0.4.0.dev0. Could you please share your deepspeed config file as well as the script for training?

@limbo92 your suggestion worked for me as well. I am not sure what is wrong with the FastChat that gives OOM to everyone!

I got this error when I downgrade transformer version from 4.30.2 to 4.29.2 to fix another issue(Found optimizer configured in the DeepSpeed config, but no scheduler). When I revert transformer version to 4.30.2 and use some other methods to solve the another issue, this issue is also fixed.