ctransformers: transformers 4.34 caused NotImplementedError when calling CTransformersTokenizer(PreTrainedTokenizer)

transformers version: pip install transformers==4.34.0 ctransformersversion: pip install ctransformers==0.2.27

I encounter the following error

File ".venv\lib\site-packages\ctransformers\transformers.py", line 84, in __init__kages\ctransformers\transformers.py", line 84, in __init__
    super().__init__(**kwargs)

File ".venv\lib\site-packages\transformers\tokenization_utils.py", line 366, in __init__
    self._add_tokens(self.all_special_tokens_extended, special_tokens=True)

File ".venv\lib\site-packages\transformers\tokenization_utils.py", line 462, in _add_tokens
    current_vocab = self.get_vocab().copy()

File ".venv\lib\site-packages\transformers\tokenization_utils_base.py", line 1715, in ``get_vocab
    raise NotImplementedError()``
NotImplementedError

transformers has PreTrainedTokenizer in tokenization_utils.py code change (2da8853) where _add_tokens on line 454 current_vocab = self.get_vocab().copy().

PreTrainedTokenizer itself has added_tokens_decoder and __len__ implemented, so only get_vocab would cause NotImplementedError()

About this issue

  • Original URL
  • State: open
  • Created 9 months ago
  • Reactions: 2
  • Comments: 17

Commits related to this issue

Most upvoted comments

ok, I quickly write this up and it works fine (you will need transformers==4.34.0 then build ctransformers from #155 and install)


import os
from ctransformers import (
    AutoModelForCausalLM as cAutoModelForCausalLM,
    AutoTokenizer as cAutoTokenizer,
)

model = cAutoModelForCausalLM.from_pretrained(
            model_path_or_repo_id="TheBloke/Mistral-7B-OpenOrca-GGUF", 
            model_file="mistral-7b-openorca.Q5_K_M.gguf", 
            model_type="mistral",
            hf=True,
            temperature=0.7,
            top_p=0.7,
            top_k=50,
            repetition_penalty=1.2,
            context_length=8096,
            max_new_tokens=2048,
            threads=os.cpu_count(),
            stream=True,
            gpu_layers=0
            )
tokenizer = cAutoTokenizer.from_pretrained(model)

mistral_no_mem_prompt_template = """
<|im_start|>system
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.
<|im_end|>
{placeholder}
"""

mistral_openorca_prompt = """
<|im_start|>user
{input}<|im_end|>
<|im_start|>assistant
"""

mistral_no_mem_template = mistral_no_mem_prompt_template.replace("{placeholder}", mistral_openorca_prompt)
question = "The cafeteria had 23 apples. If they used 20 for lunch and bought 6 more, how many apples do they have?"
prompt = mistral_no_mem_template.replace("{input}", question)

input_ids = tokenizer(prompt, return_tensors="pt").input_ids.to("cpu")
generated_ids = model.generate(input_ids, max_new_tokens=2048, temperature=0.7, do_sample=True)
response = tokenizer.decode(generated_ids[0][input_ids.shape[-1]:], skip_special_tokens=True, clean_up_tokenization_space=True)
print(f"Response: {response}")

Hi @victorlee0505 . I’ve rebuilt with PR https://github.com/marella/ctransformers/pull/155 and can confirm, the NotImplementedError is gone. Thanks!

make sure to run export CT_CUBLAS=ON before python setup.py sdist otherwise it won’t build the cuda support.

you might also need to setup these two in your bashrc and confirm the nvcc version matches nvidia-smi

export PATH="/usr/local/cuda/bin:$PATH"
export LD_LIBRARY_PATH="/usr/local/cuda/lib64"

So i get x15 faster token output by having no gpu layers… I think something is wrong