transformers: [`Generate`] Fix `gradient_checkpointing` and `use_cache` bug for generate-compatible models
Feature request
When using a model that uses gradient_checkpointing
and if a user wants to call generate
with use_cache
, it leads some models to bugs, such as the one described in https://github.com/huggingface/transformers/pull/21733
The fix should be to slightly refactor some models following the same procedure as in the aforementioned PR
How to participate
- If it is your first time here, have a quick look at our contribution guidelines 🤗
- Pick a model from the list below. Check in the comments here if it hasn’t been claimed yet.
- Claim your models in the comments (e.g. “I want to work on GPT2”)
- Replicate the changes of this PR to your model of choice. In other words, move the
if
block to the line above the... if use_cache else None
, in the same.forward()
function. Please note that some models may have more than one instance of this block! - Make sure you’ve run our automated code formatting tool (i.e. run
make fixup
in your shell – also runmake fix-copies
if it requests you to do so) - Open a PR. Tag @younesbelkada or @gante (one of us is enough)
That’s it! With each change, you’ll be making transformers
a little bit better for all of us đź’›
Models to fix:
- Bart | https://github.com/huggingface/transformers/pull/21866
- Bert
- BigBird | https://github.com/huggingface/transformers/pull/21882
- BigBirdPegasus
- BioGPT | https://github.com/huggingface/transformers/pull/21844
- Blenderbot
- BlenderbotSmall
- BlipText
- Bloom
- CodeGen
- Esm
- Git | https://github.com/huggingface/transformers/pull/21818
- GPT2 | https://github.com/huggingface/transformers/pull/21772
- GptNeo | https://github.com/huggingface/transformers/pull/21733
- GptNeoX | https://github.com/huggingface/transformers/pull/21815
- GPT-J
- ImageGPT | https://github.com/huggingface/transformers/pull/21816
- LED | https://github.com/huggingface/transformers/pull/21840
- LongT5
- M2M100 | https://github.com/huggingface/transformers/pull/21841
- Marian | https://github.com/huggingface/transformers/pull/21842
- MBart | https://github.com/huggingface/transformers/pull/21918
- MegratronBert | https://github.com/huggingface/transformers/pull/21921
- MVP | https://github.com/huggingface/transformers/pull/21920
- OPT
- Pegasus
- PegasusX
- ProphetNet | https://github.com/huggingface/transformers/pull/21772
- RemBert
- RoFormer
- Speech2Text
- Speech2Text2
- SpeechT5
- SwitchTransformer
- T5
- TimeSeriesTransformer
- TrajectoryTransformer
- TrOCR
- Whisper
- XGLM
- XLMRobertaXL
- Xmod
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 27 (26 by maintainers)
yes @gante @annahung31 , the PR is here: https://github.com/huggingface/transformers/pull/22272
hi @mollerup23 Of course yes! Please feel free to take it!
@younesbelkada Can I claim TimeSeriesTransformer?
Hello đź‘‹, I would like to contribute and work on T5. Let me know, Thanks! PR for the suggested changes.
HI @gante working on Whisper XGLM XLMRobertaXL
Hello, can I work on Bloom?
I am happy to pick up other models too. Can I work on Bart, Bert, BigBird.
Hey @saswatmeher Of course yes!! You can pick up a model that has not been taken yet, for example
BioGpt
and do the following:1- Fork this repository 2- Clone your fork locally and create a new branch
git checkout -b fix-bio-gpt-issue
3- Modify the filesrc/transformers/models/biogpt/modeling_biogpt.py
the same way as all the contributors have modified their files in #21818 #21833 #21815 etc. (You can checkFiles Changed
on the PR, on the right top of the Pull Request page) 4- Apply these changes and push the changes on your branch 5- Finally open a Pull Request betweenfix-bio-gpt-issue
andmain
branch oftransformers
(+ tag us, myself + @gante ) and we should be good to go!Let us know if you have more questions!
Thanks a mile @KMFODA ! đź’Ż Feel free to take those, and tag me or @gante whenever you feel ready!
Happy to take on Git, GptNeoX, ImageGPT, LED, LongT5, M2M100, Marian, MBart, MegratronBert, MVP, OPT, Pegasus, PegasusX, RemBert, RoFormer
I would like to work on Blenderbot
I want to work on GPT-J!
Would like to take GPT-2!
Hey @connor-henderson 👋 Thank you for the suggestion! Usually, I’d give the green light to configuration-related DRY approaches such as the one you suggested. However, this one would sit right in
forward()
, and we prioritize clear code (= avoid abstractions) in the modeling code itself.In case you’re curious about this position, we have a blog post about why we do it here 🤗