llama.cpp: Sorry, your GGJTv1 file of type MOSTLY_Q4_1_SOME_F16 is not eligible for conversion

ValueError: Quantizations changed in GGJTv2. Can only convert unquantized GGML files older than GGJTv2. Sorry, your GGJTv1 file of type MOSTLY_Q4_1_SOME_F16 is not eligible for conversion.

I got this file from

https://huggingface.co/anon8231489123/gpt4-x-alpaca-13b-native-4bit-128g/tree/main/gpt4-x-alpaca-13b-ggml-q4_1-from-gptq-4bit-128g

It gives me

ggml-model-q4_1.bin

I ran

python3.10 convert-llama-ggml-to-gguf.py --input models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.bin --output models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.gguf I am trying to use llama.cpp to use this model as so

./main -m models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.bin -t 4 -c 2048 -n 2048 --color -i --instruct It fails with

Log start
main: build = 1407 (465219b)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: seed  = 1697867527
gguf_init_from_file: invalid magic characters tjgg.
error loading model: llama_model_loader: failed to load model from models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.bin

llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.bin'
main: error: unable to load model

Any help would be appreciated.

About this issue

  • Original URL
  • State: closed
  • Created 8 months ago
  • Comments: 22 (1 by maintainers)

Most upvoted comments

As the error message explicitly puts it, due to the unsupported quantization scheme in that file which was obssoleted even before GGUF, it’s not possible to convert it to GGUF. I recommend you to find the original Pytorch checkpoints of that model and do a fresh conversion with the recent code.

looks like you are not using the converted file ? (.bin instead of .gguf)

Ok I tried another model from TheBloke

Does @TheBloke document which llama.cpp version he is using to convert? Then it would be easy to use exactly that version of llama.cpp for your use case, I would think.

I do, yes - in the commit message, eg: image

I’ve been doing that for the last month or two.

(Not that was related to this issue, as it seems OP was trying to convert an AWQ file! Although now I think about it, I’m not recording the version of AutoAWQ I make AWQs with, or the version of AutoGPTQ or Transformers I make GPTQs with. I should do that. It changes vastly less often than llama.cpp does (I auto git pull llama.cpp and build if it changes before every model), but I should still record versions for AWQ and GPTQ. I’ll implement that soon.)

Ok I tried another model from TheBloke

That one is already quantized in AWQ format.

A little background info that might help you:

Models exist in an original format (on HF most of the time that’s going to be “HuggingFace format”) where there are a bunch of binary parts and also metadata describing various stuff about the model. For example: https://huggingface.co/chavinlo/gpt4-x-alpaca/tree/main

For the purposes of using llama.cpp, this is what you need if you want to convert to GGUF format. GGUF format is what llama.cpp uses, so you can use the HuggingFace format directly, you have to convert it first.

Generally when converting, people also quantize the model. That takes the model where each parameter is a 16bit number and basically compresses it down. Different projects support different types of quantization. llama.cpp uses GGML as the backend, so it supports the types of quantizations that GGML supports: Q4_0, Q4_K_M, etc.

There are other projects that have developed their own quantizations and file formats. AWQ is one, GPTQ is another, there are also others I didn’t mention. Like with llama.cpp and GGML, they started with the HuggingFace format model and converted to the format their project can handle.

Also, quantization is a lossy process, kind of like compressing a video file. Some of the data is lost, so you can’t really take an AWQ or some other already quantized format and convert it to GGUF with the quantization formats llama.cpp supports.

The reason why the GGML (file format) to GGUF conversion can work in some cases is because the actual quantized data format didn’t change, even though the container of the file changed. Kind of like how you can repackage a .webm into a .mp4 file by just copying the x264 video data (this analogy may or may not make sense). Anyway, there have been changes to quantization formats in the history of the llama.cpp project so if you go back far enough you’ll find incompatible changes.


It looks like you downloaded Mistral-Pygmalion-7B-AWQ - that’s just the wrong format for llama.cpp. TheBloke actually already published pre-converted GGUF format files which will work: https://huggingface.co/TheBloke/Mistral-Pygmalion-7B-GGUF

Any of the .gguf files there should work. Note, unlike when trying to convert from the HuggingFace format repo, you only need one of those.