transformers: Errors while training apple/mobilevit-xx-small on image-classification example with and without deepspeed
System Info
- transformers installed from source
- python 3.8
- ZeRO-Stage-1
Who can help?
@amyeroberts @NielsRogge @JingyaHuang
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
python -m torch.distributed.launch --nproc_per_node=8 ~/transformers/examples/pytorch/image-classification/run_image_classification.py --model_name_or_path apple/mobilevit-xx-small --dataset_name beans --overwrite_output_dir --output_dir ./outputs/ --remove_unused_columns False --do_train --do_eval --learning_rate 2e-5 --num_train_epochs 50 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --logging_strategy steps --logging_steps 10 --evaluation_strategy epoch --seed 1337 --fp16 True --report_to none --ignore_mismatched_sizes True
AttributeError: ‘MobileViTImageProcessor’ object has no attribute ‘image_mean’
python -m torch.distributed.launch --nproc_per_node=8 ~/transformers/examples/pytorch/image-classification/run_image_classification.py --model_name_or_path apple/mobilevit-xx-small --dataset_name beans --overwrite_output_dir --output_dir ./outputs/ --remove_unused_columns False --do_train --do_eval --learning_rate 2e-5 --num_train_epochs 50 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --logging_strategy steps --logging_steps 10 --evaluation_strategy epoch --seed 1337 --fp16 True --report_to none --ignore_mismatched_sizes True --deepspeed ~/zero_stage_1.json
AttributeError: ‘MobileViTConfig’ object has no attribute ‘hidden_size’
Expected behavior
I expect both examples to train with the deepspeed-enabled run completing faster than baseline. Currently, both scenarios error out. Thank you in advance for the assistance.
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 15 (13 by maintainers)
@NielsRogge, most Microsoft internal training pipelines including AzureML leverage DeepSpeed since it provides better training speed and smaller memory footprint. When we evaluate any Hugging Face models, we always try to integrate both ORT and DeepSpeed to maximize training speed.