transformers: `KeyError: 'Cache only has 0 layers, attempted to access layer with index 0'`

System Info

transformers version: 4.36.0
Platform: Linux-5.15.0-70-generic-x86_64-with-glibc2.35
Python version: 3.11.4
Huggingface_hub version: 0.19.4
Safetensors version: 0.3.3
Accelerate version: 0.25.0.dev0
Accelerate config: not found
PyTorch version (GPU?): 2.1.0+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes

Who can help?

@gante

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, TextStreamer

long_text = # ...

model_id = "mistralai/Mixtral-8x7B-Instruct-v0.1"
tokenizer = AutoTokenizer.from_pretrained(model_id)

model = AutoModelForCausalLM.from_pretrained(model_id, use_flash_attention_2=True, torch_dtype=torch.float16, device_map="auto")

streamer = TextStreamer(tokenizer, skip_prompt=True, skip_special_tokens=True)

messages = [
    {"role": "user", "content": f"Summarize the following:\n{long_text}"}
]

inputs = tokenizer.apply_chat_template(messages, return_tensors="pt").to(model.device)

outputs = model.generate(inputs, max_new_tokens=8192, do_sample=True, streamer=streamer)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

Expected behavior

Expected it to work or at least give me a cuda error.

About this issue

Original URL
State: closed
Created 7 months ago
Reactions: 4
Comments: 21 (7 by maintainers)

Most upvoted comments

Hi! I’m Having the same Error: only happens when the token length is greater than the sliding window size (I do not have the error with transformers version 4.34.0, but when I upgrade to 4.36.0 I get the error)

Thanks! 😃

transformers version: 4.36.0
Platform: Linux 5.10.0-26-cloud-amd64
Python version: 3.10.13
Huggingface_hub version: 0.19.4
Safetensors version: 0.4.1
PyTorch version (GPU?): 2.1.1+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed
Using GPU in script?: Yes
Using distributed or parallel set-up in script?: Yes

Traceback (most recent call last): File “/opt/conda/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py”, line 426, in run_asgi result = await app( # type: ignore[func-returns-value] File “/opt/conda/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py”, line 84, in call return await self.app(scope, receive, send) File “/opt/conda/lib/python3.10/site-packages/fastapi/applications.py”, line 1106, in call await super().call(scope, receive, send) File “/opt/conda/lib/python3.10/site-packages/starlette/applications.py”, line 122, in call await self.middleware_stack(scope, receive, send) File “/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 184, in call raise exc File “/opt/conda/lib/python3.10/site-packages/starlette/middleware/errors.py”, line 162, in call await self.app(scope, receive, _send) File “/opt/conda/lib/python3.10/site-packages/starlette/middleware/cors.py”, line 83, in call await self.app(scope, receive, send) File “/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 79, in call raise exc File “/opt/conda/lib/python3.10/site-packages/starlette/middleware/exceptions.py”, line 68, in call await self.app(scope, receive, sender) File “/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 20, in call raise e File “/opt/conda/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py”, line 17, in call await self.app(scope, receive, send) File “/opt/conda/lib/python3.10/site-packages/starlette/routing.py”, line 718, in call await route.handle(scope, receive, send) File “/opt/conda/lib/python3.10/site-packages/starlette/routing.py”, line 276, in handle await self.app(scope, receive, send) File “/opt/conda/lib/python3.10/site-packages/starlette/routing.py”, line 66, in app response = await func(request) File “/opt/conda/lib/python3.10/site-packages/fastapi/routing.py”, line 274, in app raw_response = await run_endpoint_function( File “/opt/conda/lib/python3.10/site-packages/fastapi/routing.py”, line 191, in run_endpoint_function return await dependant.call(**values) File “/home/mmockus/dev/chatbot/rGPT/host.py”, line 301, in benchmark_model response, time, previous_prompt = rgpt( File “/home/mmockus/dev/chatbot/rGPT/RGPT.py”, line 384, in call output = self.__generate_text(final_prompt) File “/opt/conda/lib/python3.10/site-packages/transformers/pipelines/base.py”, line 1140, in call return self.run_single(inputs, preprocess_params, forward_params, postprocess_params) File “/opt/conda/lib/python3.10/site-packages/transformers/pipelines/base.py”, line 1147, in run_single model_outputs = self.forward(model_inputs, **forward_params) File “/opt/conda/lib/python3.10/site-packages/transformers/pipelines/base.py”, line 1046, in forward model_outputs = self._forward(model_inputs, **forward_params) File “/home/mmockus/dev/chatbot/rGPT/instruct_pipeline.py”, line 60, in _forward generated_sequence = self.model.generate( File “/opt/conda/lib/python3.10/site-packages/torch/utils/_contextlib.py”, line 115, in decorate_context return func(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py”, line 1764, in generate return self.sample( File “/opt/conda/lib/python3.10/site-packages/transformers/generation/utils.py”, line 2861, in sample outputs = self( File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl return forward_call(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py”, line 165, in new_forward output = module._old_forward(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py”, line 1212, in forward outputs = self.model( File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl return forward_call(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py”, line 1080, in forward layer_outputs = decoder_layer( File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl return forward_call(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py”, line 165, in new_forward output = module._old_forward(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py”, line 796, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1518, in _wrapped_call_impl return self._call_impl(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/torch/nn/modules/module.py”, line 1527, in _call_impl return forward_call(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/accelerate/hooks.py”, line 165, in new_forward output = module._old_forward(*args, **kwargs) File “/opt/conda/lib/python3.10/site-packages/transformers/models/mixtral/modeling_mixtral.py”, line 441, in forward past_key = past_key_value[0] File “/opt/conda/lib/python3.10/site-packages/transformers/cache_utils.py”, line 78, in getitem raise KeyError(f"Cache only has {len(self)} layers, attempted to access layer with index {layer_idx}") KeyError: ‘Cache only has 0 layers, attempted to access layer with index 0’

manueljmockus on Dec 13, 2023

Opening a PR to fix it 😃

gante on Dec 14, 2023

Thank you so much! I can confirm training works now 😃

orendar on Dec 14, 2023

The PR linked above fixes it 😃

gante on Dec 14, 2023

This indeed seems like a caching issue. cc @gante It seems like this snippet was not updated to work with the new Cache class: https://github.com/huggingface/transformers/blob/2788f8d8d5f9cee2fe33a9292b0f3570bd566a6d/src/transformers/models/mistral/modeling_mistral.py#L388-L407

I suspect that not using Flash Attention 2 may solve the issue in the meantime.

Tom Aarsen

tomaarsen on Dec 14, 2023