llama.cpp: Converting GGML->GGUF: ValueError: Only GGJTv3 supported

Prerequisites

Please answer the following questions for yourself before submitting an issue.

  • [X ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
  • [ X] I carefully followed the README.md.
  • [X ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
  • [X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.

Expected Behavior

My GGML converted models should be easy to convert to GGUF. I know the conversion tools aren’t guaranteed but I’d like to file this one in case anybody else has a workaround or more version flexible option. I would love to see any version of GGML/GGJT supported if possible. Instead my GGML files converted earlier are apparently not supported for conversion to GGUF.

Is there any tool to show the standard version details of a model file? Happy to contribute one if there isn’t.

Current Behavior

python3 ./convert-llama-ggmlv3-to-gguf.py -i llama-2-70b/ggml-model-f32.bin -o test.gguf
=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===

* Scanning GGML input file
Traceback (most recent call last):
  File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 353, in <module>
    main()
  File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 335, in main
    offset = model.load(data, 0)
             ^^^^^^^^^^^^^^^^^^^
  File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 125, in load
    offset += self.validate_header(data, offset)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 121, in validate_header
    raise ValueError('Only GGJTv3 supported')
ValueError: Only GGJTv3 supported

Environment and Context

Working with models

  • Physical (or virtual) hardware you are using, e.g. for Linux: Physical Fedora 38, probably irrelevant give the Python.

$ lscpu

Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         46 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  56
  On-line CPU(s) list:   0-55
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
    CPU family:          6
    Model:               79
    Thread(s) per core:  2
    Core(s) per socket:  14
    Socket(s):           2
    Stepping:            1
    CPU(s) scaling MHz:  40%
    CPU max MHz:         3200.0000
    CPU min MHz:         1200.0000
    BogoMIPS:            3990.92
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts a
                         cpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_per
                         fmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes
                         64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_
                         2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
                         refetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb st
                         ibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bm
                         i2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc
                          cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   896 KiB (28 instances)
  L1i:                   896 KiB (28 instances)
  L2:                    7 MiB (28 instances)
  L3:                    70 MiB (2 instances)
NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0-13,28-41
  NUMA node1 CPU(s):     14-27,42-55
  • Operating System, e.g. for Linux:

$ uname -a Linux z840 6.4.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 23 17:46:49 UTC 2023 x86_64 GNU/Linux

  • SDK version, e.g. for Linux:
$ python3 --version
Python 3.11.4
$ make --version
$ g++ --version

Failure Information (for bugs)

python3 ./convert-llama-ggmlv3-to-gguf.py -i llama-2-70b/ggml-model-f32.bin -o test.gguf
=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===

* Scanning GGML input file
Traceback (most recent call last):
  File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 353, in <module>
    main()
  File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 335, in main
    offset = model.load(data, 0)
             ^^^^^^^^^^^^^^^^^^^
  File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 125, in load
    offset += self.validate_header(data, offset)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 121, in validate_header
    raise ValueError('Only GGJTv3 supported')
ValueError: Only GGJTv3 supported

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

  1. step 1 convert any of the PTH models to GGML (using previous unversioned commits of convert)
  2. step 2 convert the GGML to GGUF with the command given above.

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 25 (5 by maintainers)

Most upvoted comments

Amazing. I still haven’t gotten back to my workstation to test yet. Thank you.

On Wed, Sep 6, 2023, 09:49 Kerfuffle @.***> wrote:

Closed #2990 https://github.com/ggerganov/llama.cpp/issues/2990 as completed via #3023 https://github.com/ggerganov/llama.cpp/pull/3023.

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/issues/2990#event-10293051082, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZP3DCNSOEIJUNIOCY2MP3XZA2JJANCNFSM6AAAAAA4JHU6P4 . You are receiving this because you were mentioned.Message ID: @.***>

Awesome thank you so much. I’m on travel for a few days but will try as soon as I get back to my workstation.

On Tue, Sep 5, 2023, 10:48 Kerfuffle @.***> wrote:

@danielbrdz https://github.com/danielbrdz @jboero https://github.com/jboero Please try converting with #3023 https://github.com/ggerganov/llama.cpp/pull/3023 - that version should convert even very old GGML format files when it’s possible. In cases where conversion isn’t possible, it should give you a better error message.

I actually don’t have any GGML files older than GGJTv3 laying around so I’d appreciate any testing with older files.

Please note that in cases where the quantization format changed it’s just not possible to convert the file. So if your GGML isn’t f16 or f32 format and it’s older than GGJTv2 it just can’t be converted. If it’s GGJTv2 and Q8 or Q4 quantized then it also can’t be converted since the format for those quantizations changed it GGJTv3.

Even for those files that can’t be converted, it would be helpful if people can test and report back. You should get a reasonable error message when the file can’t be converted, like:

ValueError: Q4 and Q8 quantizations changed in GGJTv3. Sorry, your GGJTv2 file of type MOSTLY_Q8_0 is not eligible for conversion.

It should also report the file format when loading to enable better reporting of problems:

  • Scanning GGML input file
  • File format: GGJTv3 with ftype MOSTLY_Q8_0

— Reply to this email directly, view it on GitHub https://github.com/ggerganov/llama.cpp/issues/2990#issuecomment-1706297304, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABZP3DEFH727L7V657IDLM3XY3YPBANCNFSM6AAAAAA4JHU6P4 . You are receiving this because you were mentioned.Message ID: @.***>

Is it really impractical for you to just download the GGUF version?

For those of us who use 30b/70B models yes. It very much is impractical to download 40GB over and over again. To download unquantized models it’s also impractical because they are hundreds of GB. If you are downloading 10 - 20 models over time this is virtually impossible due to data caps and internet speeds.