transformers: Errors while training apple/mobilevit-xx-small on image-classification example with and without deepspeed

System Info

  • transformers installed from source
  • python 3.8
  • ZeRO-Stage-1

Who can help?

@amyeroberts @NielsRogge @JingyaHuang

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

python -m torch.distributed.launch --nproc_per_node=8 ~/transformers/examples/pytorch/image-classification/run_image_classification.py --model_name_or_path apple/mobilevit-xx-small --dataset_name beans --overwrite_output_dir --output_dir ./outputs/ --remove_unused_columns False --do_train --do_eval --learning_rate 2e-5 --num_train_epochs 50 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --logging_strategy steps --logging_steps 10 --evaluation_strategy epoch --seed 1337 --fp16 True --report_to none --ignore_mismatched_sizes True

AttributeError: ‘MobileViTImageProcessor’ object has no attribute ‘image_mean’

python -m torch.distributed.launch --nproc_per_node=8 ~/transformers/examples/pytorch/image-classification/run_image_classification.py --model_name_or_path apple/mobilevit-xx-small --dataset_name beans --overwrite_output_dir --output_dir ./outputs/ --remove_unused_columns False --do_train --do_eval --learning_rate 2e-5 --num_train_epochs 50 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --logging_strategy steps --logging_steps 10 --evaluation_strategy epoch --seed 1337 --fp16 True --report_to none --ignore_mismatched_sizes True --deepspeed ~/zero_stage_1.json

AttributeError: ‘MobileViTConfig’ object has no attribute ‘hidden_size’

Expected behavior

I expect both examples to train with the deepspeed-enabled run completing faster than baseline. Currently, both scenarios error out. Thank you in advance for the assistance.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 1
  • Comments: 15 (13 by maintainers)

Most upvoted comments

@NielsRogge, most Microsoft internal training pipelines including AzureML leverage DeepSpeed since it provides better training speed and smaller memory footprint. When we evaluate any Hugging Face models, we always try to integrate both ORT and DeepSpeed to maximize training speed.