FastChat: NotImplementedError: Cannot copy out of meta tensor; no data!
Hello,I used Qlora to train,but I get an error: NotImplementedError: Cannot copy out of meta tensor; no data!
requriements.txt:
peft @ file:///root/peft
torch==1.13.1+cu116
torchaudio==0.13.1+cu116
torchvision==0.14.1+cu116
transformers==4.28.1
deepspeed==0.9.4
flash-attn==0.2.0
This my train code:
CUDA_VISIBLE_DEVICES=0 deepspeed fastchat/train/train_lora.py \
--model_name_or_path ../vicuna-7b \
--lora_r 8 \
--lora_alpha 16 \
--lora_dropout 0.05 \
--data_path ./data/dummy_conversation.json \
--bf16 True \
--output_dir ./checkpoints \
--num_train_epochs 3 \
--per_device_train_batch_size 4 \
--per_device_eval_batch_size 4 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1200 \
--save_total_limit 100 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--tf32 True \
--model_max_length 2048 \
--q_lora True \
--deepspeed playground/deepspeed_config_s2.json
Can you help me,please? @merrymercy
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 18 (3 by maintainers)
To fix the issue with the “found optimizer but no scheduler”, simply remove the optimizer from the deepspeed config. This was a new change with the new version of transformers (4.30.2). Check out this issue for more information regarding supported combinations.
I believe that the old transformers version (4.28.1) I was using also faced the OOM issues that you guys are getting. After installing (4.30.2), this was resolved with LLaMA-7B taking about 5GB of VRAM and fine-tuning with LoRA taking about 10GB of VRAM.
@kanslor I add this to deepspeed.config file “scheduler”:{ “type”:“WarmupLR”, “params”: {“warmup_min_lr”:“auto”, “warmup_max_lr”:“auto”, “warmup_num_steps”:“auto” } } And I also faced OOM error, so I give up using official fine-tune method. I use this git repo https://github.com/hiyouga/LLaMA-Efficient-Tuning/tree/main#ppo-training-rlhf to successfully fine-tune Vicuna7B. This is my lora weight. It only needs almost 16 GPU memory to fine-tune.
This is my command line to Fine-Tune by using new git repo.
CUDA_VISIBLE_DEVICES=0 python src/train_sft.py
–model_name_or_path your_model_path
–do_train
–dev_ratio 0.1
–dataset dummy_conversation_for_LETM
–finetuning_type lora
–output_dir output_path_you_want
–overwrite_cache
–per_device_train_batch_size 4
–gradient_accumulation_steps 4
–lr_scheduler_type cosine
–logging_steps 10
–save_steps 1000
–learning_rate 5e-5
–num_train_epochs 3.0
–plot_loss
–fp16
Web Demo
python src/cli_demo.py
–model_name_or_path vicuna_model_path
–checkpoint_dir fine_tuned_lora_path
python src/web_demo.py
–model_name_or_path vicuna_model_path
–checkpoint_dir fine_tuned_lora_path
Combine Vicuna7B and lora weight/ Export model
python src/export_model.py
–model_name_or_path vicuna_model_path
–checkpoint_dir fine_tuned_lora_path
–output_dir combine_model_path
My deepspeed config and training script is the same as listed here
@BabyChouSr I am still getting the OOM for GPU, even after removing both optimizer and scheduler configs. Transformers’ version is
4.30.2
and peft is0.4.0.dev0
. Could you please share your deepspeed config file as well as the script for training?@limbo92 your suggestion worked for me as well. I am not sure what is wrong with the FastChat that gives OOM to everyone!
I got this error when I downgrade transformer version from 4.30.2 to 4.29.2 to fix another issue(Found
optimizer
configured in the DeepSpeed config, but noscheduler
). When I revert transformer version to 4.30.2 and use some other methods to solve the another issue, this issue is also fixed.