transformers: ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

System Info

4.27.1

Who can help?

No response

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

i test llama in colab here is my code and output:

!pip install git+https://github.com/huggingface/transformers !pip install sentencepiece

import torch from transformers import pipeline,LlamaTokenizer,LlamaForCausalLM

device = “cuda:0” if torch.cuda.is_available() else “cpu” print(device)

tokenizer = LlamaTokenizer.from_pretrained(“decapoda-research/llama-7b-hf”)

model = LlamaForCausalLM.from_pretrained(“decapoda-research/llama-7b-hf”)

generator = pipeline(model=“decapoda-research/llama-7b-hf”, device=device) generator("I can’t believe you did such a ")

ValueError Traceback (most recent call last) <ipython-input-3-c1d71e177e5a> in <module> 7 # tokenizer = LlamaTokenizer.from_pretrained(“decapoda-research/llama-7b-hf”) 8 # model = LlamaForCausalLM.from_pretrained(“decapoda-research/llama-7b-hf”) ----> 9 generator = pipeline(model=“decapoda-research/llama-7b-hf”, device=device) 10 generator("I can’t believe you did such a ")

1 frames /usr/local/lib/python3.9/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 675 676 if tokenizer_class is None: –> 677 raise ValueError( 678 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported." 679 )

ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.

Expected behavior

expect output generated info

About this issue

Original URL
State: closed
Created a year ago
Reactions: 1
Comments: 50 (2 by maintainers)

Most upvoted comments

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.

This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

+113

nameless0704 on Mar 21, 2023

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer.

This is likely due to the configuration files being created before the final PR was merged in.

+49

amyeroberts on Mar 17, 2023

@yhifny Are you able to import the tokenizer directly using from transformers import LlamaTokenizer ?

If not, can you make sure that you are working from the development branch in your environment using: pip install git+https://github.com/huggingface/transformers

more details here.

+21

amyeroberts on Mar 17, 2023

For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it’s probably better to try find or save a new model with the new naming.

+21

mbehm on Mar 17, 2023

As the error message probably mentions, you need to install sentencepiece: pip install sentencepiece.

+17

sgugger on Mar 17, 2023

I cloned the repo and changed the tokenizer in the config file to LlamaTokenizer but I got ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.

+16

yhifny on Mar 17, 2023

in my code, transformer==4.30.0 can fix it

Nayahei on Jul 26, 2023

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer. This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

I assume this is applied to the llama-7b cloned repo from HuggingFace right? How can I instantiate the model and the tokenizer after doing that please?

sarrahbbh on Mar 29, 2023

You should just using that checkpoint. The maintainers of that repo have made it clear that they are not interested in being compatible with Transformers by ignoring the 62 PRs trying to fix their checkpoints. The huggyllama checkpoints are confirmed to work if you are looking for an alternative (but you should still request the weights to Meta following their official form).

There are now 903 checkpoints for llama on the Hub and only the 4 from decapoda-research do not work since they created them before the PR for Llama was merged into Transformers. We won’t break the code for the other 899 checkpoints.

sgugger on Apr 27, 2023

I face the same issue

yhifny on Mar 17, 2023

I can import the LlamaTokenizer class, but getting error that from_pretrained method is None. Anyone else having this issue?

nadahlberg on Mar 17, 2023

Install this library pip install -U transformers

mahsan-py on Feb 26, 2024

            if( "LLaMATokenizer" == tokenizer_class_candidate ):  ## add these 2 line to solve it.
                tokenizer_class_candidate = 'LlamaTokenizer'  
            tokenizer_class = tokenizer_class_from_name(tokenizer_class_candidate)

MasterLivens on Apr 27, 2023

Working now. I swear I had sentencepiece, but probably forgot to reset the runtime 🤦 My bad!

nadahlberg on Mar 17, 2023

Please. Im facing the same issue. Can anyone help ? I tried all the above methods.

SanjayKotabagi on Jul 20, 2023

Hi @candowu, thanks for raising this issue. This is arising, because the tokenizer in the config on the hub points to LLaMATokenizer. However, the tokenizer in the library is LlamaTokenizer. This is likely due to the configuration files being created before the final PR was merged in.

Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.

Can you please enlighten me on how this could be achieved please? I’m new to this

sarrahbbh on May 15, 2023

Will this problem be fixed by updating to newest version of transformers, or must we modify the config file manually each time?

CoinCheung on Apr 27, 2023

Hi

I installed from source

git clone https://github.com/huggingface/transformers.git cd transformers pip install -e .

pip list show:

transformers 4.29.0.dev0 D:\myfolder\transformers

but I still have

ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.

hey, try this rep pip install git+https://github.com/mbehm/transformers, maybe it can work

SirLaughsALot007 on Apr 21, 2023

Please. Im facing the same issue. Can anyone help ? I tried all the above methods.

I had the same issue and it was solved by: pip uninstall transformers pip install transformers

PawelFaron on Jul 21, 2023

I installed from source

git clone https://github.com/huggingface/transformers.git cd transformers pip install -e .

pip list show:

transformers 4.29.0.dev0 D:\myfolder\transformers

but I still have

ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.

thibaudart on Apr 18, 2023

You can open the documentation at the install page.

sgugger on Apr 12, 2023

You can try this for a ather crazy way to find out what is the right casing for the module:

import transformers

from itertools import product
import importlib

def find_variable_case(s, max_tries=1000):
  var_permutations = list(map("".join, product(*zip(s.upper(), s.lower()))))
  # Intuitively, any camel casing should minimize the no. of upper chars.
  # From https://stackoverflow.com/a/58789587/610569
  var_permutations.sort(key=lambda ss: (sum(map(str.isupper, ss)), len(ss)))
  for i, v in enumerate(var_permutations):
    if i > max_tries:
      return
    try:
      dir(transformers).index(v)
      return v
    except:
      continue


v = find_variable_case('LLaMatokenizer')
exec(f"from transformers import {v}")
vars()[v]

[out]:

transformers.utils.dummy_sentencepiece_objects.LlamaTokenizer

alvations on Apr 2, 2023

Thank you so much for this! Works! That’s amazing!

RiseInRose on Mar 31, 2023

same error on model codellama/CodeLlama-13b-hf can onyone post a valid json config here ?

PGTBoos on Sep 18, 2023

@ndvbd you should be able to use AutoTokenizer with any tokenizers on the hub. If you have an issue and want us to help you, we really need a small reproducer, and the full traceback.

For anyone still getting the same error:

make sure you are using the correct version of transformers. print(transformers.__version__)
if you are working on a notebook cc @youshikyou make sure to restart the kernel after you have installed the packages, to make sure the changes are taken into account.
make sure the repository you are loading from (for example meta-llama/Llama-2-7b-hg) has the correct LlamaTokenizer class if you are using AutoModel.

If you have a different issue, make sure to open a new issue and ping me 🤗

ArthurZucker on Sep 4, 2023

@MasterLivens hi, i am currently using colab, which file should i add this code?

zhiyixu on May 4, 2023

You need to install the library from source to be able to use the LLaMA model.

sgugger on Apr 12, 2023

you are a life saver. There docs on the site should be updated for this reference.

thekevshow on Mar 30, 2023