transformers: Fine tuning TensorFlow DeBERTa fails on TPU

System Info

Latest version of transformers, Colab TPU, tensorflow 2.

  • Colab TPU
  • transformers: 4.21.0
  • tensorflow: 2.8.2 / 2.6.2
  • Python 3.7

Who can help?

@LysandreJik, @Rocketknight1, @san

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

I am facing some issues while trying to fine-tune a TensorFlow DeBERTa model microsoft/deberta-v3-base on TPU.

I have created some Colab notebooks showing the errors. Note, the second and third notebooks already include some measures to circumvent previous errors.

I have seen similar issues when using microsoft/deberta-base.

I believe the following issues are related:

Thanks!

Expected behavior

Fine tuning is possible as it happens when using a GPU.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 17 (12 by maintainers)

Most upvoted comments

@gante

Great, setting the batch_size works 🥳. I only had to make sure that it divides the strategy.num_replicas_in_sync, FineTuning_TF_DeBERTa_Working_Fix_TPU. Thanks a lot, I will test the procedure now on my real use case at hand.

Weird!

During my TPU and GPU tests, i was using a custom training loop instead of keras’s .fit(), which I’m not sure if it actually matters.

In my custom training code, I got deberta to train in an electra style training, with XLA enabled with jit_compile=True with non of the issues mentioned above.

I will be sharing my code asap once I finish the pretraining and validate the results. It is based on Nvidia BERT and Electra TF2 training code https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/LanguageModeling/