Nuitka: Tokenizer requires the SentencePiece library but it was not found in yo_sentencepiece_objects.py", line 93, in __init__ur environment. Checkout the instructions on the

When using: nuitka 1.7.9 transformers==4.29.2 This works perfectly fine, it is able to find the sentence piece library and tokenizer and tokenize.

When using nuitka 1.7.9 transformers==4.31.0

LlamaTokenizer requires the SentencePiece library but it was not found in yo_sentencepiece_objects.py", line 93, in __init__ur environment. Checkout the instructions on the t_utils.py", line 1027, in requires_backends installation page of its repo: https://github.com/google/sentencepiece#installation and follow the ones ur environment. Checkout the instructions on the that match your environment. Please note that you may need to restart your rllation and follow the onesuntime after installation.

Here is an example

from transformers import LlamaTokenizer

import logging

logging.basicConfig(
    format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S"
)

vocab_file = r"C:\Users\Tensor\Desktop\TestSentencePieceHF\Pygmalion-7b\tokenizer.model"
tokenizer_file = r"C:\Users\Tensor\Desktop\TestSentencePieceHF\Pygmalion-7b\tokenizer.json"

tokenizer = LlamaTokenizer(vocab_file=vocab_file,tokenizer_file=tokenizer_file,use_fast=True)

examples = [
    tokenizer(
        "auto-gptq is an easy-to-use model quantization library with user-friendly apis, based on GPTQ algorithm."
    )
]
print(tokenizer("auto_gptq is", return_tensors="pt"))

This is the model files to be loaded C:\Users\Tensor\Desktop\TestSentencePiece\Pygmalion-7b-4bit-GPTQ-Safetensors https://drive.google.com/file/d/10ZD89g7BVpjbmnkrzIEirxvNzoLwdxOX/view?usp=sharing

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 16 (10 by maintainers)

Most upvoted comments

So, transformers has a bunch of meta data dependencies we do not yet list, so --include-distribution-metadata can likely solve this, but I am doing two things for this. First, in case of missing metadata, it’s going to give an error that suggests the options. And second, I am going to add a bunch of these from the source, but it’s likely a moving target, but I don’t want code to extract these.

I am having this issue as well. Would be thankful for a solution!

@ArEnSc I am fallen a bit behind on this one, because the new CI machine arrived, and I would like to do the VM for testing these kinds of things there, please allow for a bit more time

Absolutely! just surfacing these things so it doesn’t get lost 😃 thank you again!