transformers: OOM when trying to fine tune patrickvonplaten/led-large-16384-pubmed

I’m currently following this notebook but instead I’m using patrickvonplaten/led-large-16384-pubmed


tokenizer = AutoTokenizer.from_pretrained("patrickvonplaten/led-large-16384-pubmed",)

led = AutoModelForSeq2SeqLM.from_pretrained(
   "patrickvonplaten/led-large-16384-pubmed",
    gradient_checkpointing=True,
    use_cache=False,
)

instead of allenai/led-large-16384 as the base model and tokenizer. I’m also using my own train/test data. With the exception of that, I kept everything else the same/consistent to that notebook as far as fine tuning. However, I’m running into OOM errors

RuntimeError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 15.78 GiB total capacity; 13.96 GiB already allocated; 20.00 MiB free; 14.56 GiB reserved in total by PyTorch)

  0%|          | 0/3 [00:10<?, ?it/s]

on a couple ofTesla V100-SXM2-16GB and I’m not sure why that might be. The batch_size=2 seems pretty small and I also set gradient_checkpoint=True. @patrickvonplaten and/or the surrounding community, I’d greatly appreciate any help with this

About this issue

Original URL
State: closed
Created 3 years ago
Reactions: 1
Comments: 29 (18 by maintainers)

Most upvoted comments

The model is actually quite big so I would expect it to OOM, if you are doing multi GPU training, you could try fairscale/deepspeed integration for saving memory and speeding up the training, check out this blog post https://huggingface.co/blog/zero-deepspeed-fairscale

patil-suraj on Feb 5, 2021

ok, figured it out - I suggested for you try to disable the gradient checkpointing in the context of being unable to use Deepspeed, but I didn’t think of asking you to restore this config…

So enable from_pretrained(MODEL_NAME, gradient_checkpointing=True,...

And voila, this config works just fine:

encoder_max_length = 2048
decoder_max_length = 256
batch_size = 4

You can go for even larger length, it should have a very small impact. And I think your batch size can now be even larger, so that you can remove gradient_accumulation_steps if wanted - or reduce it.

I updated the notebook, so you can see it working: https://colab.research.google.com/drive/1rEspdkR839xZzh561OwSYLtFnnKhQdEl?usp=sharing

stas00 on Feb 15, 2021

Glad to hear you were able to make progress, @mmoya01

What was the command line you used to launch this program? You have to launch it via deepspeed as the docs instruct.

edit: actually just learned that it doesn’t have to be the case - will update the docs shortly, but I still need to know how you started the program. thank you.

I also noticed, because of this import in deepspeed, I ended up pip installing mpi4py in addition to deepspeed and installing libopenmpi-dev in my cuda image.

This is odd that you had to do it manually, DeepSpeed’s pip installer should have installed all the dependencies automatically.

I will see if I can reproduce that.

not sure if it’s because of checkpoint_tag_validation_fail. I’d greatly appreciate your feedback

Have you tried w/o gradient checking?

The failure is not in the transformers land so it’s a bit hard to guess what has happened.

I’d recommend filing an Issue with DeepSpeed: https://github.com/microsoft/DeepSpeed/issues

stas00 on Feb 10, 2021