transformers: [`Generate`] Fix `gradient_checkpointing` and `use_cache` bug for generate-compatible models

Feature request

When using a model that uses gradient_checkpointing and if a user wants to call generate with use_cache, it leads some models to bugs, such as the one described in https://github.com/huggingface/transformers/pull/21733

The fix should be to slightly refactor some models following the same procedure as in the aforementioned PR

How to participate

  1. If it is your first time here, have a quick look at our contribution guidelines 🤗
  2. Pick a model from the list below. Check in the comments here if it hasn’t been claimed yet.
  3. Claim your models in the comments (e.g. “I want to work on GPT2”)
  4. Replicate the changes of this PR to your model of choice. In other words, move the if block to the line above the ... if use_cache else None, in the same .forward() function. Please note that some models may have more than one instance of this block!
  5. Make sure you’ve run our automated code formatting tool (i.e. run make fixup in your shell – also run make fix-copies if it requests you to do so)
  6. Open a PR. Tag @younesbelkada or @gante (one of us is enough)

That’s it! With each change, you’ll be making transformers a little bit better for all of us 💛

Models to fix:

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 27 (26 by maintainers)

Most upvoted comments

hi @mollerup23 Of course yes! Please feel free to take it!

@younesbelkada Can I claim TimeSeriesTransformer?

Hello đź‘‹, I would like to contribute and work on T5. Let me know, Thanks! PR for the suggested changes.

HI @gante working on Whisper XGLM XLMRobertaXL

Hello, can I work on Bloom?

I am happy to pick up other models too. Can I work on Bart, Bert, BigBird.

Hey @saswatmeher Of course yes!! You can pick up a model that has not been taken yet, for example BioGpt and do the following:

1- Fork this repository 2- Clone your fork locally and create a new branch git checkout -b fix-bio-gpt-issue 3- Modify the file src/transformers/models/biogpt/modeling_biogpt.py the same way as all the contributors have modified their files in #21818 #21833 #21815 etc. (You can check Files Changed on the PR, on the right top of the Pull Request page) 4- Apply these changes and push the changes on your branch 5- Finally open a Pull Request between fix-bio-gpt-issue and main branch of transformers (+ tag us, myself + @gante ) and we should be good to go!

Let us know if you have more questions!

Thanks a mile @KMFODA ! đź’Ż Feel free to take those, and tag me or @gante whenever you feel ready!

Happy to take on Git, GptNeoX, ImageGPT, LED, LongT5, M2M100, Marian, MBart, MegratronBert, MVP, OPT, Pegasus, PegasusX, RemBert, RoFormer

I would like to work on Blenderbot

I want to work on GPT-J!

Would like to take GPT-2!

Hey @connor-henderson 👋 Thank you for the suggestion! Usually, I’d give the green light to configuration-related DRY approaches such as the one you suggested. However, this one would sit right in forward(), and we prioritize clear code (= avoid abstractions) in the modeling code itself.

In case you’re curious about this position, we have a blog post about why we do it here 🤗