trl: Mixtral sft error

I am following Fine-tuning with 🤗 TRL and run with:

accelerate launch --config_file examples/accelerate_configs/multi_gpu.yaml --num_processes=4 \
    examples/scripts/sft.py \
    --model_name /nas/lili/models_hf/Mixtral-8x7B-Instruct-v0.1 \
    --dataset_name trl-lib/ultrachat_200k_chatml \
    --batch_size 2 \
    --gradient_accumulation_steps 1 \
    --learning_rate 2e-4 \
    --save_steps 200_000 \
    --use_peft \
    --peft_lora_r 16 --peft_lora_alpha 32 \
    --target_modules q_proj k_proj v_proj o_proj \
    --load_in_4bit \
    --output output \
    --use_auth_token false

It throws:

Traceback (most recent call last):
  File "/nas/lili/codes/pt/ft/trl/examples/scripts/sft.py", line 158, in <module>
    trainer.train()
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/trl/trainer/sft_trainer.py", line 315, in train
    output = super().train(*args, **kwargs)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/trainer.py", line 1537, in train
    return inner_training_loop(
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/trainer.py", line 1821, in _inner_training_loop
    for step, inputs in enumerate(epoch_iterator):
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/accelerate/data_loader.py", line 448, in __iter__
    current_batch = next(dataloader_iter)
  File "/home/ubuntu/miniconda3/envs/torchshare/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 630, in __next__
    data = self._next_data()
  File "/home/ubuntu/miniconda3/envs/torchshare/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 674, in _next_data
    data = self._dataset_fetcher.fetch(index)  # may raise StopIteration
  File "/home/ubuntu/miniconda3/envs/torchshare/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 54, in fetch
    return self.collate_fn(data)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/data/data_collator.py", line 45, in __call__  
    return self.torch_call(features)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/data/data_collator.py", line 732, in torch_call
    batch = self.tokenizer.pad(examples, return_tensors="pt", pad_to_multiple_of=self.pad_to_multiple_of)
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 3259, in pad
    padding_strategy, _, max_length, _ = self._get_padding_truncation_strategies(
  File "/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/transformers/tokenization_utils_base.py", line 2707, in _get_padding_truncation_strategies
    raise ValueError(
ValueError: Asking to pad but the tokenizer does not have a padding token. Please select a token to use as `pad_token` `(tokenizer.pad_token = tokenizer.eos_token e.g.)` or add a new pad token via `tokenizer.add_special_tokens({'pad_token': '[PAD]'})`.

Besides this error, there are warning like:

/home/ubuntu/.local/share/virtualenvs/ft-Zgps2Kz_/lib/python3.9/site-packages/trl/trainer/sft_trainer.py:282: UserWarning: You passed a token
izer with `padding_side` not equal to `right` to the SFTTrainer. This might lead to some unexpected behaviour due to overflow issues when tra
ining a model in half-precision. You might consider adding `tokenizer.padding_side = 'right'` to your code.

my env:

torch                    2.1.2
transformers             4.36.2
trl                      0.7.8.dev0
accelerate               0.25.0
peft                     0.7.1
bitsandbytes             0.41.3.post2

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Comments: 18 (11 by maintainers)

Most upvoted comments

@fancyerii - this is expected no? If you use 8GPUs then the training time gets correctly split across all GPUs no?

can you pass {"use_reentrant": False} into gradient_checkpointing_kwargs in TrainingArguments ?