transformers: Fine tuning TensorFlow DeBERTa fails on TPU
System Info
Latest version of transformers, Colab TPU, tensorflow 2.
- Colab TPU
- transformers: 4.21.0
- tensorflow: 2.8.2 / 2.6.2
- Python 3.7
Who can help?
@LysandreJik, @Rocketknight1, @san
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
I am facing some issues while trying to fine-tune a TensorFlow DeBERTa model microsoft/deberta-v3-base
on TPU.
I have created some Colab notebooks showing the errors. Note, the second and third notebooks already include some measures to circumvent previous errors.
- ValueError with partially known TensorShape with latest
take_along_axis
change: FineTuning_TF_DeBERTa_TPU_1 - Output shape mismatch of branches with custom dropout: FineTuning_TF_DeBERTa_TPU_2
- XLA compilation error because of dynamic/computed tensor shapes: FineTuning_TF_DeBERTa_TPU_3
I have seen similar issues when using microsoft/deberta-base
.
I believe the following issues are related:
- TF2 DeBERTaV2 runs super slow on TPUs #18239
- Debertav2 debertav3 TPU : socket closed #18276. From this I used the fix on
take_along_axis
.
Thanks!
Expected behavior
Fine tuning is possible as it happens when using a GPU.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 17 (12 by maintainers)
@gante
Great, setting the
batch_size
works 🥳. I only had to make sure that it divides thestrategy.num_replicas_in_sync
, FineTuning_TF_DeBERTa_Working_Fix_TPU. Thanks a lot, I will test the procedure now on my real use case at hand.Weird!
During my TPU and GPU tests, i was using a custom training loop instead of keras’s
.fit()
, which I’m not sure if it actually matters.In my custom training code, I got deberta to train in an electra style training, with XLA enabled with
jit_compile=True
with non of the issues mentioned above.I will be sharing my code asap once I finish the pretraining and validate the results. It is based on Nvidia BERT and Electra TF2 training code https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow2/LanguageModeling/