transformers: AttributeError: 'tuple' object has no attribute 'to_legacy_cache'

System Info

transformers 4.36.1.

transformers/models/llama/modeling_llama.py", line 1093, in forward
    next_cache = next_decoder_cache.to_legacy_cache() if use_legacy_cache else next_decoder_cache
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'to_legacy_cache'

This error pops up when running inference with llama 2 model with the new tranformers 4.36.1. I didn’t test 4.36.0. It was running correctly with 4.35.x.

This seems to be related to changes from #26681, and commit 633215b. @ArthurZucker and @younesbelkada according to suggestions in “Who can help?”

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, …)
  • My own task or dataset (give details below)

Reproduction

Sorry that I don’t have an easy reprod now. Here is the relavant stack trace:

  File "###transformers/generation/utils.py", line 1764, in generate
    return self.sample(
           ^^^^^^^^^^^^
  File "###transformers/generation/utils.py", line 2861, in sample
    outputs = self(
              ^^^^^
  File "###torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "###transformers/models/llama/modeling_llama.py", line 1181, in forward
    outputs = self.model(
              ^^^^^^^^^^^
  File "###torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "###transformers/models/llama/modeling_llama.py", line 1093, in forward
    next_cache = next_decoder_cache.to_legacy_cache() if use_legacy_cache else next_decoder_cache
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'tuple' object has no attribute 'to_legacy_cache'

Expected behavior

Crash with the provided stack track.

About this issue

  • Original URL
  • State: closed
  • Created 7 months ago
  • Reactions: 4
  • Comments: 18 (2 by maintainers)

Most upvoted comments

Apparently this issue was introduced due to this this commit PR #26681 by @tomaarsen and @patrickvonplaten

next_decoder_cache should be a cache, which means it is not well initialized as a cache. Instead of a tuple , the new HF implementation pass a list of cache:

https://github.com/tomaarsen/transformers/blob/ee60b1cc13e2819ef31e69952c0b6f616bd724b8/src/transformers/models/llama/modeling_llama.py#L287C45-L287C76
layer_idx: Optional[int] = None

#https://github.com/tomaarsen/transformers/blob/ee60b1cc13e2819ef31e69952c0b6f616bd724b8/src/transformers/models/llama/modeling_llama.py#L355
past_key_value: Optional[Cache] = None,

Layer_idx is later used by past_key_value, and past_key_value is currently replaced as list of Cache.

Note the diff contains a kind of cache (for attention KV cache) which implements to_legacy_cache.

I guess deepspeed version does not instantiate llama attention correctly or we should change the code as @fxmarty suggests:

        if use_cache:
	            use_legacy_cache = not isinstance(past_key_values, Cache) and past_key_values is not None
	            if use_legacy_cache:
	                past_key_values = DynamicCache.from_legacy_cache(past_key_values)
	            elif past_key_values is None:
	                past_key_values = DynamicCache()
	            past_key_values_length = past_key_values.get_seq_length()

I have the same issue with transofrmers 4.36.1. I am using DeepSpeed framework to generate a response and face this the same error.

To me, it works with transformers==4.34.1

hi @wuxb45 can you share a fully reproducible snippet?