llama.cpp: Failed to convert Llama-v2 models

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [Y] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [Y] I carefully followed the README.md.
  • [Y] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [Y] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

Successful converting Llama models using the following command:

python convert.py models/xxx

where xxx is the original trained Llama model downloaded from Facebook.

Current Behavior

Cannot convert with errors (detailed below).

I’ve found there was an update in the following commit, after checkout an older version, the conversion could be done:

873637afc7924f435ac44c067630a28e82eefa7b

It seems that after the above commit, convert.py does not support the BPE vocab format anymore (–vocabtype param has been removed). While README did not reflect such change. This causes confusion.

Environment and Context

  • Physical (or virtual) hardware you are using, e.g. for Linux:

Macbook M3 Max

  • Operating System, e.g. for Linux:

MacOS - 14.1 (23B2073)

  • SDK version, e.g. for Linux:

Python 3.10.13 transformers 4.36.1

Failure Information (for bugs)

Loading model file models/13B/consolidated.00.pth
Loading model file models/13B/consolidated.01.pth
params = Params(n_vocab=32000, n_embd=5120, n_layer=40, n_ctx=4096, n_ff=13824, n_head=40, n_head_kv=40, n_experts=None, n_experts_used=None, f_norm_eps=1e-05, rope_scaling_type=None, f_rope_freq_base=None, f_rope_scale=None, n_orig_ctx=None, rope_finetuned=None, ftype=None, path_model=PosixPath('models/13B'))
Traceback (most recent call last):
  File "llama.cpp/convert.py", line 1279, in <module>
    main()
  File "llama.cpp/convert.py", line 1255, in main
    vocab = VocabLoader(params, vocab_dir)
  File "llama.cpp/convert.py", line 342, in __init__
    self.tokenizer = AutoTokenizer.from_pretrained(str(fname_tokenizer), trust_remote_code=True)
  File "python3.10/site-packages/transformers/models/auto/tokenization_auto.py", line 752, in from_pretrained
    config = AutoConfig.from_pretrained(
  File "python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 1082, in from_pretrained
    config_dict, unused_kwargs = PretrainedConfig.get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "python3.10/site-packages/transformers/configuration_utils.py", line 644, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "python3.10/site-packages/transformers/configuration_utils.py", line 699, in _get_config_dict
    resolved_config_file = cached_file(
  File "python3.10/site-packages/transformers/utils/hub.py", line 360, in cached_file
    raise EnvironmentError(
OSError: models/13B does not appear to have a file named config.json. Checkout 'https://huggingface.co/models/13B/None' for available files.

About this issue

  • Original URL
  • State: closed
  • Created 6 months ago
  • Reactions: 13
  • Comments: 28 (19 by maintainers)

Most upvoted comments

I am running into this same issues, Can you post if you find a resolution?

As a temporary fix you can simply checkout an older commit, such as:

git checkout 0353a1840134b24b07ab61fd4490192f28c4db6b

This is the latest commit before this bug appears.

Between work, holidays, and helping out my cousins, I got sick over the holidays, so that’s why I went MIA. Spent the last few days fighting off the fever. Starting to feel better now. Wondering if any progress was made, if not, I can pick up where I left off. I’m skipping work over the weekend to recoup, so I’ll need something to keep me busy.

I did start on a new project named to_gguf which was supposed to isolate and reorganize a lot the client facing code. It was mostly experimental so I could play around with it without messing with upstream source code. Any progress I make there, I could push upstream if there was any interest in it. If not, it would be educational for me regardless.

SPM and BPE vocabularies were removed, so if you’re using a non-huggingface model, you’ll get this error. If you use facebooks hf model, it will work.

transformers.AutoTokenizer is looking for the vocabulary and fails to find it because it’s looking for remote access for a file that doesn’t exist.

This is simply due to the way the vocabularies are handled. I’ll be looking into it over the next few days to see if I can fix the regression.

It’s fine as it is since there is a workaround suggested. The old method created issues with other models where n_vocab cannot be deduced correctly

the conversion fails with the latest in master. Reverting to 0353a1840134b24b07ab61fd4490192f28c4db6b allows convert.py to progress.

the new backtrace with master is

$ git pull --rebase
$ git checkout master
$ python3.9 convert.py models/7B/
......
skipping tensor blk.20.attn_rot_embd
skipping tensor blk.21.attn_rot_embd
skipping tensor blk.22.attn_rot_embd
skipping tensor blk.23.attn_rot_embd
skipping tensor blk.24.attn_rot_embd
skipping tensor blk.25.attn_rot_embd
skipping tensor blk.26.attn_rot_embd
skipping tensor blk.27.attn_rot_embd
skipping tensor blk.28.attn_rot_embd
skipping tensor blk.29.attn_rot_embd
skipping tensor blk.30.attn_rot_embd
skipping tensor blk.31.attn_rot_embd
Writing models/7B/ggml-model-f16.gguf, format 1
Traceback (most recent call last):
  File "/opt/test/software/llama.cpp/convert.py", line 1658, in <module>
    main(sys.argv[1:])  # Exclude the first element (script name) from sys.argv
  File "/opt/test/software/llama.cpp/convert.py", line 1643, in main
    OutputFile.write_all(
  File "/opt/test/software/llama.cpp/convert.py", line 1188, in write_all
    check_vocab_size(params, vocab, pad_vocab=pad_vocab)
  File "/opt/test/software/llama.cpp/convert.py", line 993, in check_vocab_size
    raise ValueError(
ValueError: The model's vocab size is set to -1 in params.json. Please update it manually. Maybe 32000?

@teleprint-me Thanks for looking into this. Pinging also @strutive07 for any extra insight on this

Same here @HighTemplar-wjiang. I found an alternative by reverting to commit f4d973cecb7368c985720ba9100ae6abba14806d and running the README instructions. It did not solve the problema but can give us some tips about what happened.

I attach an image below of the quantized model being tested

image