transformers: ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
System Info
4.27.1
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examplesfolder (such as GLUE/SQuAD, …) - My own task or dataset (give details below)
Reproduction
i test llama in colab here is my code and output:
!pip install git+https://github.com/huggingface/transformers !pip install sentencepiece
import torch from transformers import pipeline,LlamaTokenizer,LlamaForCausalLM
device = “cuda:0” if torch.cuda.is_available() else “cpu” print(device)
tokenizer = LlamaTokenizer.from_pretrained(“decapoda-research/llama-7b-hf”)
model = LlamaForCausalLM.from_pretrained(“decapoda-research/llama-7b-hf”)
generator = pipeline(model=“decapoda-research/llama-7b-hf”, device=device) generator("I can’t believe you did such a ")
ValueError Traceback (most recent call last) <ipython-input-3-c1d71e177e5a> in <module> 7 # tokenizer = LlamaTokenizer.from_pretrained(“decapoda-research/llama-7b-hf”) 8 # model = LlamaForCausalLM.from_pretrained(“decapoda-research/llama-7b-hf”) ----> 9 generator = pipeline(model=“decapoda-research/llama-7b-hf”, device=device) 10 generator("I can’t believe you did such a ")
1 frames /usr/local/lib/python3.9/dist-packages/transformers/models/auto/tokenization_auto.py in from_pretrained(cls, pretrained_model_name_or_path, *inputs, **kwargs) 675 676 if tokenizer_class is None: –> 677 raise ValueError( 678 f"Tokenizer class {tokenizer_class_candidate} does not exist or is not currently imported." 679 )
ValueError: Tokenizer class LLaMATokenizer does not exist or is not currently imported.
Expected behavior
expect output generated info
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 1
- Comments: 50 (2 by maintainers)
Change the LLaMATokenizer in tokenizer_config.json into lowercase LlamaTokenizer and it works like a charm.
Hi @candowu, thanks for raising this issue. This is arising, because the
tokenizerin the config on the hub points toLLaMATokenizer. However, the tokenizer in the library isLlamaTokenizer.This is likely due to the configuration files being created before the final PR was merged in.
@yhifny Are you able to import the tokenizer directly using
from transformers import LlamaTokenizer?If not, can you make sure that you are working from the development branch in your environment using:
pip install git+https://github.com/huggingface/transformersmore details here.
For anybody interested I was able to load an earlier saved model with the same issue using my fork with the capitalization restored. That being said for future it’s probably better to try find or save a new model with the new naming.
As the error message probably mentions, you need to install sentencepiece:
pip install sentencepiece.I cloned the repo and changed the tokenizer in the config file to LlamaTokenizer but I got ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.
in my code, transformer==4.30.0 can fix it
I assume this is applied to the llama-7b cloned repo from HuggingFace right? How can I instantiate the model and the tokenizer after doing that please?
You should just using that checkpoint. The maintainers of that repo have made it clear that they are not interested in being compatible with Transformers by ignoring the 62 PRs trying to fix their checkpoints. The huggyllama checkpoints are confirmed to work if you are looking for an alternative (but you should still request the weights to Meta following their official form).
There are now 903 checkpoints for llama on the Hub and only the 4 from decapoda-research do not work since they created them before the PR for Llama was merged into Transformers. We won’t break the code for the other 899 checkpoints.
I face the same issue
I can import the
LlamaTokenizerclass, but getting error thatfrom_pretrainedmethod is None. Anyone else having this issue?Install this library pip install -U transformers
Working now. I swear I had sentencepiece, but probably forgot to reset the runtime 🤦 My bad!
Please. Im facing the same issue. Can anyone help ? I tried all the above methods.
Can you please enlighten me on how this could be achieved please? I’m new to this
Will this problem be fixed by updating to newest version of transformers, or must we modify the config file manually each time?
hey, try this rep
pip install git+https://github.com/mbehm/transformers, maybe it can workI had the same issue and it was solved by: pip uninstall transformers pip install transformers
Hi
I installed from source
git clone https://github.com/huggingface/transformers.git cd transformers pip install -e .
pip list show:
transformers 4.29.0.dev0 D:\myfolder\transformers
but I still have
ValueError: Tokenizer class LlamaTokenizer does not exist or is not currently imported.
You can open the documentation at the install page.
You can try this for a ather crazy way to find out what is the right casing for the module:
[out]:
Thank you so much for this! Works! That’s amazing!
same error on model codellama/CodeLlama-13b-hf can onyone post a valid json config here ?
@ndvbd you should be able to use
AutoTokenizerwith any tokenizers on the hub. If you have an issue and want us to help you, we really need a small reproducer, and the full traceback.For anyone still getting the same error:
print(transformers.__version__)meta-llama/Llama-2-7b-hg) has the correctLlamaTokenizerclass if you are usingAutoModel.If you have a different issue, make sure to open a new issue and ping me 🤗
@MasterLivens hi, i am currently using colab, which file should i add this code?
You need to install the library from source to be able to use the LLaMA model.
you are a life saver. There docs on the site should be updated for this reference.