transformers: Errors while training apple/mobilevit-xx-small on image-classification example with and without deepspeed

System Info

transformers installed from source
python 3.8
ZeRO-Stage-1

Who can help?

@amyeroberts @NielsRogge @JingyaHuang

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

python -m torch.distributed.launch --nproc_per_node=8 ~/transformers/examples/pytorch/image-classification/run_image_classification.py --model_name_or_path apple/mobilevit-xx-small --dataset_name beans --overwrite_output_dir --output_dir ./outputs/ --remove_unused_columns False --do_train --do_eval --learning_rate 2e-5 --num_train_epochs 50 --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --logging_strategy steps --logging_steps 10 --evaluation_strategy epoch --seed 1337 --fp16 True --report_to none --ignore_mismatched_sizes True

AttributeError: ‘MobileViTImageProcessor’ object has no attribute ‘image_mean’

AttributeError: ‘MobileViTConfig’ object has no attribute ‘hidden_size’

Expected behavior

I expect both examples to train with the deepspeed-enabled run completing faster than baseline. Currently, both scenarios error out. Thank you in advance for the assistance.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 15 (13 by maintainers)

Most upvoted comments

@NielsRogge, most Microsoft internal training pipelines including AzureML leverage DeepSpeed since it provides better training speed and smaller memory footprint. When we evaluate any Hugging Face models, we always try to integrate both ORT and DeepSpeed to maximize training speed.

hanbitmyths on Feb 7, 2023