gisting: Running out of memory when trying to compress

Basically title, my spec is 6 GB VRAM on a 1070.

I used a gist model, specifically your flan-t5-gist model on HuggingFace, along with bf16 precision as suggested inside compress however I keep running into a CUDA Out of Memory error. Is there a minimum amount of VRAM any system needs before they can make use of gisting? (In another issue you pointed at 12 GB being able to work, so I’m guessing my only option is to use Accelerate)

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 25 (12 by maintainers)

Most upvoted comments

Did you try pip install -r requirements.txt? Does it throw an error?

Assuming you cloned my repository, you should alternatively be able to clone the huggingface transformers repository as well, checkout the relevant commit, then do pip install -e . in the repo directory to install the package locally.

yep, at long last haha

Alright so I copied nearly the exact instruction (had to use decapoda-research/llama-7b-hf) and evaded the OOM error 😄, but I run into this now.


Compressing instruction

Traceback (most recent call last):

  File "/pkg/modal/_container_entrypoint.py", line 330, in handle_input_exception

    yield

  File "/pkg/modal/_container_entrypoint.py", line 403, in call_function_sync

    res = fun(*args, **kwargs)

  File "/root/pipe.py", line 27, in complete

    compress.main(model_name_or_path="jayelm/llama-7b-gist-1",base_llama_path="decapoda-research/llama-7b-hf",

  File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context

    return func(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/gisting_test/src/compress.py", line 148, in main

    gist_activations = model.get_gist_activations(

  File "/usr/local/lib/python3.9/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context

    return func(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/gisting_test/src/gist_llama.py", line 643, in get_gist_activations

    model_outputs = self.model(

  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

    return forward_call(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/gisting_test/src/gist_llama.py", line 583, in forward

    layer_outputs = decoder_layer(

  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

    return forward_call(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/gisting_test/src/gist_llama.py", line 315, in forward

    hidden_states, self_attn_weights, present_key_value = self.self_attn(

  File "/usr/local/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl

    return forward_call(*args, **kwargs)

  File "/usr/local/lib/python3.9/site-packages/gisting_test/src/gist_llama.py", line 206, in forward

    query_states, key_states = apply_rotary_pos_emb(

TypeError: apply_rotary_pos_emb() got an unexpected keyword argument 'offset'

See #10, your transformers version is likely wrong.

Hi, the FLAN-T5-gist model on huggingface is 11B parameters and needs around 20-30GB VRAM in bf16 inference mode. If you have less GPU VRAM you have two options: (1) look into lower precision inference e.g. https://github.com/TimDettmers/bitsandbytes or (2) train a smaller gist model from scratch (the training commands in the README support this, but unfortunately I don’t have checkpoints for smaller gist models)