llama.cpp: Sorry, your GGJTv1 file of type MOSTLY_Q4_1_SOME_F16 is not eligible for conversion
ValueError: Quantizations changed in GGJTv2. Can only convert unquantized GGML files older than GGJTv2. Sorry, your GGJTv1 file of type MOSTLY_Q4_1_SOME_F16 is not eligible for conversion.
I got this file from
It gives me
ggml-model-q4_1.bin
I ran
python3.10 convert-llama-ggml-to-gguf.py --input models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.bin --output models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.gguf
I am trying to use llama.cpp to use this model as so
./main -m models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.bin -t 4 -c 2048 -n 2048 --color -i --instruct
It fails with
Log start
main: build = 1407 (465219b)
main: built with cc (Ubuntu 9.4.0-1ubuntu1~20.04.2) 9.4.0 for x86_64-linux-gnu
main: seed = 1697867527
gguf_init_from_file: invalid magic characters tjgg.
error loading model: llama_model_loader: failed to load model from models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.bin
llama_load_model_from_file: failed to load model
llama_init_from_gpt_params: error: failed to load model 'models/gpt4-x-alpaca-13b-native-4bit-128g/ggml-model-q4_1.bin'
main: error: unable to load model
Any help would be appreciated.
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Comments: 22 (1 by maintainers)
As the error message explicitly puts it, due to the unsupported quantization scheme in that file which was obssoleted even before GGUF, it’s not possible to convert it to GGUF. I recommend you to find the original Pytorch checkpoints of that model and do a fresh conversion with the recent code.
looks like you are not using the converted file ? (
.bininstead of.gguf)I do, yes - in the commit message, eg:
I’ve been doing that for the last month or two.
(Not that was related to this issue, as it seems OP was trying to convert an AWQ file! Although now I think about it, I’m not recording the version of AutoAWQ I make AWQs with, or the version of AutoGPTQ or Transformers I make GPTQs with. I should do that. It changes vastly less often than llama.cpp does (I auto
git pullllama.cpp and build if it changes before every model), but I should still record versions for AWQ and GPTQ. I’ll implement that soon.)That one is already quantized in AWQ format.
A little background info that might help you:
Models exist in an original format (on HF most of the time that’s going to be “HuggingFace format”) where there are a bunch of binary parts and also metadata describing various stuff about the model. For example: https://huggingface.co/chavinlo/gpt4-x-alpaca/tree/main
For the purposes of using
llama.cpp, this is what you need if you want to convert to GGUF format. GGUF format is whatllama.cppuses, so you can use the HuggingFace format directly, you have to convert it first.Generally when converting, people also quantize the model. That takes the model where each parameter is a 16bit number and basically compresses it down. Different projects support different types of quantization.
llama.cppuses GGML as the backend, so it supports the types of quantizations that GGML supports: Q4_0, Q4_K_M, etc.There are other projects that have developed their own quantizations and file formats. AWQ is one, GPTQ is another, there are also others I didn’t mention. Like with
llama.cppand GGML, they started with the HuggingFace format model and converted to the format their project can handle.Also, quantization is a lossy process, kind of like compressing a video file. Some of the data is lost, so you can’t really take an AWQ or some other already quantized format and convert it to GGUF with the quantization formats
llama.cppsupports.The reason why the GGML (file format) to GGUF conversion can work in some cases is because the actual quantized data format didn’t change, even though the container of the file changed. Kind of like how you can repackage a
.webminto a.mp4file by just copying the x264 video data (this analogy may or may not make sense). Anyway, there have been changes to quantization formats in the history of thellama.cppproject so if you go back far enough you’ll find incompatible changes.It looks like you downloaded
Mistral-Pygmalion-7B-AWQ- that’s just the wrong format forllama.cpp. TheBloke actually already published pre-converted GGUF format files which will work: https://huggingface.co/TheBloke/Mistral-Pygmalion-7B-GGUFAny of the
.gguffiles there should work. Note, unlike when trying to convert from the HuggingFace format repo, you only need one of those.