transformers: Fine tuning TensorFlow DeBERTa fails on TPU

System Info

Latest version of transformers, Colab TPU, tensorflow 2.

Colab TPU
transformers: 4.21.0
tensorflow: 2.8.2 / 2.6.2
Python 3.7

Who can help?

@LysandreJik, @Rocketknight1, @san

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

I am facing some issues while trying to fine-tune a TensorFlow DeBERTa model microsoft/deberta-v3-base on TPU.

I have created some Colab notebooks showing the errors. Note, the second and third notebooks already include some measures to circumvent previous errors.

ValueError with partially known TensorShape with latest take_along_axis change: FineTuning_TF_DeBERTa_TPU_1
Output shape mismatch of branches with custom dropout: FineTuning_TF_DeBERTa_TPU_2
XLA compilation error because of dynamic/computed tensor shapes: FineTuning_TF_DeBERTa_TPU_3

I have seen similar issues when using microsoft/deberta-base.

I believe the following issues are related:

TF2 DeBERTaV2 runs super slow on TPUs #18239
Debertav2 debertav3 TPU : socket closed #18276. From this I used the fix on take_along_axis.

Thanks!

Expected behavior

Fine tuning is possible as it happens when using a GPU.

About this issue

Original URL
State: closed
Created 2 years ago
Comments: 17 (12 by maintainers)

Most upvoted comments

@gante

Great, setting the batch_size works 🥳. I only had to make sure that it divides the strategy.num_replicas_in_sync, FineTuning_TF_DeBERTa_Working_Fix_TPU. Thanks a lot, I will test the procedure now on my real use case at hand.

tmoroder on Aug 10, 2022

Weird!

During my TPU and GPU tests, i was using a custom training loop instead of keras’s .fit(), which I’m not sure if it actually matters.

In my custom training code, I got deberta to train in an electra style training, with XLA enabled with jit_compile=True with non of the issues mentioned above.

I will be sharing my code asap once I finish the pretraining and validate the results. It is based on Nvidia BERT and Electra TF2 training code https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/LanguageModeling/

WissamAntoun on Aug 5, 2022