transformers: TrainingArguments does not support `mps` device (Mac M1 GPU)
System Info
transformersversion: 4.21.0.dev0- Platform: macOS-12.4-arm64-arm-64bit
- Python version: 3.8.9
- Huggingface_hub version: 0.8.1
- PyTorch version (GPU?): 1.12.0 (False)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: yes
- Using distributed or parallel set-up in script?: no
Who can help?
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
export TASK_NAME=wnli
python run_glue.py \
--model_name_or_path bert-base-cased \
--task_name $TASK_NAME \
--do_train \
--do_eval \
--max_seq_length 128 \
--per_device_train_batch_size 32 \
--learning_rate 2e-5 \
--num_train_epochs 3 \
--output_dir /tmp/$TASK_NAME/
Expected behavior
When running the Trainer.train on a machine with an MPS GPU, it still just uses the CPU. I expected it to use the MPS GPU. This is supported by torch in the newest version 1.12.0, and we can check if the MPS GPU is available using torch.backends.mps.is_available().
It seems like the issue lies in the TrainingArguments._setup_devices method, which doesn’t appear to allow for the case where device = "mps".
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 3
- Comments: 23 (8 by maintainers)
A simple hack fixed the issue, by simply overwriting the
deviceattribute ofTrainingArguments:This at least shows that it might just be the aforementioned
_setup_devicesthat needs changing.Hello @V-Sher, it is yet to be released. For time being, you can install transformers from the source to use this feature via the below command
Now that PyTorch
1.12.1is out I think we should reopen this issue! cc @pacman100This is not supported yet, as this has been introduced by PyTorch 1.12, which also breaks all speech models due to a regression there. We will look into the support for Mac M1 GPUs once we officially support PyTorch 1.12 (probably won’t be before they do a patch 1.12.1).
Another observation: Some PyTorch operations have not been implemented in
mpsand will throw an error. One way to get around that is to set the environment variablePYTORCH_ENABLE_MPS_FALLBACK=1, which will fallback to CPU for these operations. It still throws aUserWarninghowever.bitsandbytes does not work for mps.
After installing
transformerspackage from source as suggested by @pacman100 like this:the
mpsdevice is used with the standardTrainingArgumentsclass. Does not require the customTrainingArgumentsWithMPSSupportclass.Now the M1 Mac GPU is ~90% utilized.
Hi All: I am finetuning a BERT model with HuggingFace Trainer API in Mac OS Ventura (Intel), Python 3.10 and Torch 2.0.0. It takes 14 min in a simple scenery with CPU, with no problem. I changed to GPU with mps. Initially, GPU was not used, but after redefining TrainingArguments in this way, it worked
But the problem is that improvement over CPU is scarce (barely from 14 min to 10 min). Monitor says %GPU is only 15% peak.
Any idea about why such poor improvement?
Thanks for any help Alberto
The is the full code
We’ve also observed a drop in metrics when training, see this issue.
I have no idea, since we haven’t tried and tested it out yet. And as I said our whole CI is constrained by PyTorch < 1.12 right now, so until that pin is dropped we can’t test the integration 😃. You can certainly try it on your own fork in the meantime!