transformers: While weight conversion of llama-13b getting this error: RuntimeError: Internal: unk is not defined.

System Info

OS : Ubunto

Virtual Env :

accelerate==0.18.0 certifi==2022.12.7 charset-normalizer==3.1.0 cmake==3.26.3 filelock==3.12.0 huggingface-hub==0.13.4 idna==3.4 Jinja2==3.1.2 lit==16.0.1 MarkupSafe==2.1.2 mpmath==1.3.0 networkx==3.1 numpy==1.24.2 nvidia-cublas-cu11==11.10.3.66 nvidia-cuda-cupti-cu11==11.7.101 nvidia-cuda-nvrtc-cu11==11.7.99 nvidia-cuda-runtime-cu11==11.7.99 nvidia-cudnn-cu11==8.5.0.96 nvidia-cufft-cu11==10.9.0.58 nvidia-curand-cu11==10.2.10.91 nvidia-cusolver-cu11==11.4.0.1 nvidia-cusparse-cu11==11.7.4.91 nvidia-nccl-cu11==2.14.3 nvidia-nvtx-cu11==11.7.91 packaging==23.1 psutil==5.9.5 PyYAML==6.0 regex==2023.3.23 requests==2.28.2 sentencepiece==0.1.98 sympy==1.11.1 tokenizers==0.13.3 torch==2.0.0 tqdm==4.65.0 transformers==4.28.1 triton==2.0.0 typing_extensions==4.5.0 urllib3==1.26.15

Who can help?

@ArthurZucker @younesbelkada

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, …)
My own task or dataset (give details below)

Reproduction

Used following command to convert llama-13 weights into hf.

python src/transformers/models/llama/convert_llama_weights_to_hf.py --input_dir /home/unconveretd-weights --model_size 13B --output_dir /home/test-converted

Expected behavior

It should generated the converted weights. But instead it is generating this error

Loading the checkpoint in a Llama model. Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████| 41/41 [00:17<00:00, 2.35it/s] Saving in the Transformers format. Saving a LlamaTokenizerFast to /home/test-converted. Traceback (most recent call last): File “/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py”, line 278, in <module> main() File “/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py”, line 274, in main write_tokenizer(args.output_dir, spm_path) File “/home/transformers/src/transformers/models/llama/convert_llama_weights_to_hf.py”, line 248, in write_tokenizer tokenizer = tokenizer_class(input_tokenizer_path) File “/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama_fast.py”, line 89, in init super().init( File “/home/myenv/lib/python3.10/site-packages/transformers/tokenization_utils_fast.py”, line 117, in init slow_tokenizer = self.slow_tokenizer_class(*args, **kwargs) File “/home/myenv/lib/python3.10/site-packages/transformers/models/llama/tokenization_llama.py”, line 96, in init self.sp_model.Load(vocab_file) File “/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py”, line 905, in Load return self.LoadFromFile(model_file) File “/home/myenv/lib/python3.10/site-packages/sentencepiece/init.py”, line 310, in LoadFromFile return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg) RuntimeError: Internal: unk is not defined.

About this issue

Original URL
State: closed
Created a year ago
Comments: 22

Most upvoted comments

facing the same issue.

Rachneet on Apr 19, 2023

Ok 👍🏻 I’ll give it another go, but I remember trying with those exact weights and getting a correct conversion. Will get back to you soon!

ArthurZucker on Jun 6, 2023

I did not find the solution. but if someone wants to download the weights. following link has all the versions.

https://huggingface.co/elinas

Ahtesham00 on May 18, 2023

Hey! Thanks for reporting I’ll investigate this!

ArthurZucker on Apr 24, 2023