transformers: [`Generate`] Fix `gradient_checkpointing` and `use_cache` bug for generate-compatible models

Feature request

When using a model that uses gradient_checkpointing and if a user wants to call generate with use_cache, it leads some models to bugs, such as the one described in https://github.com/huggingface/transformers/pull/21733

The fix should be to slightly refactor some models following the same procedure as in the aforementioned PR

How to participate

If it is your first time here, have a quick look at our contribution guidelines 🤗
Pick a model from the list below. Check in the comments here if it hasn’t been claimed yet.
Claim your models in the comments (e.g. “I want to work on GPT2”)
Replicate the changes of this PR to your model of choice. In other words, move the if block to the line above the ... if use_cache else None, in the same .forward() function. Please note that some models may have more than one instance of this block!
Make sure you’ve run our automated code formatting tool (i.e. run make fixup in your shell – also run make fix-copies if it requests you to do so)
Open a PR. Tag @younesbelkada or @gante (one of us is enough)

That’s it! With each change, you’ll be making transformers a little bit better for all of us 💛

Models to fix:

Bart | https://github.com/huggingface/transformers/pull/21866
Bert
BigBird | https://github.com/huggingface/transformers/pull/21882
BigBirdPegasus
BioGPT | https://github.com/huggingface/transformers/pull/21844
Blenderbot
BlenderbotSmall
BlipText
Bloom
CodeGen
Esm
Git | https://github.com/huggingface/transformers/pull/21818
GPT2 | https://github.com/huggingface/transformers/pull/21772
GptNeo | https://github.com/huggingface/transformers/pull/21733
GptNeoX | https://github.com/huggingface/transformers/pull/21815
GPT-J
ImageGPT | https://github.com/huggingface/transformers/pull/21816
LED | https://github.com/huggingface/transformers/pull/21840
LongT5
M2M100 | https://github.com/huggingface/transformers/pull/21841
Marian | https://github.com/huggingface/transformers/pull/21842
MBart | https://github.com/huggingface/transformers/pull/21918
MegratronBert | https://github.com/huggingface/transformers/pull/21921
MVP | https://github.com/huggingface/transformers/pull/21920
OPT
Pegasus
PegasusX
ProphetNet | https://github.com/huggingface/transformers/pull/21772
RemBert
RoFormer
Speech2Text
Speech2Text2
SpeechT5
SwitchTransformer
T5
TimeSeriesTransformer
TrajectoryTransformer
TrOCR
Whisper
XGLM
XLMRobertaXL
Xmod

About this issue

Original URL
State: closed
Created a year ago
Comments: 27 (26 by maintainers)

Most upvoted comments

yes @gante @annahung31 , the PR is here: https://github.com/huggingface/transformers/pull/22272

younesbelkada on Mar 21, 2023

hi @mollerup23 Of course yes! Please feel free to take it!

younesbelkada on Mar 12, 2023

@younesbelkada Can I claim TimeSeriesTransformer?

mollerup23 on Mar 11, 2023

Hello 👋, I would like to contribute and work on T5. Let me know, Thanks! PR for the suggested changes.

nipunjindal on Mar 8, 2023

HI @gante working on Whisper XGLM XLMRobertaXL

soma2000-lang on Mar 4, 2023

Hello, can I work on Bloom?

asrimanth on Mar 2, 2023

I am happy to pick up other models too. Can I work on Bart, Bert, BigBird.

saswatmeher on Mar 1, 2023

Hey @saswatmeher Of course yes!! You can pick up a model that has not been taken yet, for example BioGpt and do the following:

1- Fork this repository 2- Clone your fork locally and create a new branch git checkout -b fix-bio-gpt-issue 3- Modify the file src/transformers/models/biogpt/modeling_biogpt.py the same way as all the contributors have modified their files in #21818 #21833 #21815 etc. (You can check Files Changed on the PR, on the right top of the Pull Request page) 4- Apply these changes and push the changes on your branch 5- Finally open a Pull Request between fix-bio-gpt-issue and main branch of transformers (+ tag us, myself + @gante ) and we should be good to go!

Let us know if you have more questions!

younesbelkada on Feb 28, 2023

Thanks a mile @KMFODA ! 💯 Feel free to take those, and tag me or @gante whenever you feel ready!

younesbelkada on Feb 27, 2023

Happy to take on Git, GptNeoX, ImageGPT, LED, LongT5, M2M100, Marian, MBart, MegratronBert, MVP, OPT, Pegasus, PegasusX, RemBert, RoFormer

KMFODA on Feb 27, 2023

I would like to work on Blenderbot

Batese2001 on Feb 24, 2023

I want to work on GPT-J!

krypticmouse on Feb 24, 2023

Would like to take GPT-2!

yhl48 on Feb 24, 2023

Hey @connor-henderson 👋 Thank you for the suggestion! Usually, I’d give the green light to configuration-related DRY approaches such as the one you suggested. However, this one would sit right in forward(), and we prioritize clear code (= avoid abstractions) in the modeling code itself.

In case you’re curious about this position, we have a blog post about why we do it here 🤗

gante on Feb 23, 2023