transformers: AttributeError: 'tuple' object has no attribute 'to_legacy_cache'
System Info
transformers 4.36.1.
transformers/models/llama/modeling_llama.py", line 1093, in forward
next_cache = next_decoder_cache.to_legacy_cache() if use_legacy_cache else next_decoder_cache
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'to_legacy_cache'
This error pops up when running inference with llama 2 model with the new tranformers 4.36.1. I didn’t test 4.36.0. It was running correctly with 4.35.x.
This seems to be related to changes from #26681, and commit 633215b. @ArthurZucker and @younesbelkada according to suggestions in “Who can help?”
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
Sorry that I don’t have an easy reprod now. Here is the relavant stack trace:
File "###transformers/generation/utils.py", line 1764, in generate
return self.sample(
^^^^^^^^^^^^
File "###transformers/generation/utils.py", line 2861, in sample
outputs = self(
^^^^^
File "###torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "###transformers/models/llama/modeling_llama.py", line 1181, in forward
outputs = self.model(
^^^^^^^^^^^
File "###torch/nn/modules/module.py", line 1538, in _call_impl
result = forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "###transformers/models/llama/modeling_llama.py", line 1093, in forward
next_cache = next_decoder_cache.to_legacy_cache() if use_legacy_cache else next_decoder_cache
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'to_legacy_cache'
Expected behavior
Crash with the provided stack track.
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 4
- Comments: 18 (2 by maintainers)
Apparently this issue was introduced due to this this commit PR #26681 by @tomaarsen and @patrickvonplaten
next_decoder_cache should be a cache, which means it is not well initialized as a cache. Instead of a tuple , the new HF implementation pass a list of cache:
Layer_idx is later used by
past_key_value
, andpast_key_value
is currently replaced as list of Cache.Note the diff contains a kind of cache (for attention KV cache) which implements
to_legacy_cache
.I guess deepspeed version does not instantiate llama attention correctly or we should change the code as @fxmarty suggests:
I have the same issue with transofrmers 4.36.1. I am using DeepSpeed framework to generate a response and face this the same error.
To me, it works with
transformers==4.34.1
hi @wuxb45 can you share a fully reproducible snippet?