llama.cpp: Converting GGML->GGUF: ValueError: Only GGJTv3 supported
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [X ] I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- [ X] I carefully followed the README.md.
- [X ] I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- [X ] I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
My GGML converted models should be easy to convert to GGUF. I know the conversion tools aren’t guaranteed but I’d like to file this one in case anybody else has a workaround or more version flexible option. I would love to see any version of GGML/GGJT supported if possible. Instead my GGML files converted earlier are apparently not supported for conversion to GGUF.
Is there any tool to show the standard version details of a model file? Happy to contribute one if there isn’t.
Current Behavior
python3 ./convert-llama-ggmlv3-to-gguf.py -i llama-2-70b/ggml-model-f32.bin -o test.gguf
=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===
* Scanning GGML input file
Traceback (most recent call last):
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 353, in <module>
main()
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 335, in main
offset = model.load(data, 0)
^^^^^^^^^^^^^^^^^^^
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 125, in load
offset += self.validate_header(data, offset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 121, in validate_header
raise ValueError('Only GGJTv3 supported')
ValueError: Only GGJTv3 supported
Environment and Context
Working with models
- Physical (or virtual) hardware you are using, e.g. for Linux: Physical Fedora 38, probably irrelevant give the Python.
$ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 56
On-line CPU(s) list: 0-55
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) CPU E5-2660 v4 @ 2.00GHz
CPU family: 6
Model: 79
Thread(s) per core: 2
Core(s) per socket: 14
Socket(s): 2
Stepping: 1
CPU(s) scaling MHz: 40%
CPU max MHz: 3200.0000
CPU min MHz: 1200.0000
BogoMIPS: 3990.92
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts a
cpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_per
fmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes
64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_
2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowp
refetch cpuid_fault epb cat_l3 cdp_l3 invpcid_single pti intel_ppin ssbd ibrs ibpb st
ibp tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 hle avx2 smep bm
i2 erms invpcid rtm cqm rdt_a rdseed adx smap intel_pt xsaveopt cqm_llc cqm_occup_llc
cqm_mbm_total cqm_mbm_local dtherm ida arat pln pts vnmi md_clear flush_l1d
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 896 KiB (28 instances)
L1i: 896 KiB (28 instances)
L2: 7 MiB (28 instances)
L3: 70 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0-13,28-41
NUMA node1 CPU(s): 14-27,42-55
- Operating System, e.g. for Linux:
$ uname -a
Linux z840 6.4.12-200.fc38.x86_64 #1 SMP PREEMPT_DYNAMIC Wed Aug 23 17:46:49 UTC 2023 x86_64 GNU/Linux
- SDK version, e.g. for Linux:
$ python3 --version
Python 3.11.4
$ make --version
$ g++ --version
Failure Information (for bugs)
python3 ./convert-llama-ggmlv3-to-gguf.py -i llama-2-70b/ggml-model-f32.bin -o test.gguf
=== WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===
* Scanning GGML input file
Traceback (most recent call last):
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 353, in <module>
main()
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 335, in main
offset = model.load(data, 0)
^^^^^^^^^^^^^^^^^^^
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 125, in load
offset += self.validate_header(data, offset)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "[PATH]/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 121, in validate_header
raise ValueError('Only GGJTv3 supported')
ValueError: Only GGJTv3 supported
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- step 1 convert any of the PTH models to GGML (using previous unversioned commits of convert)
- step 2 convert the GGML to GGUF with the command given above.
About this issue
- Original URL
- State: closed
- Created 10 months ago
- Comments: 25 (5 by maintainers)
Amazing. I still haven’t gotten back to my workstation to test yet. Thank you.
On Wed, Sep 6, 2023, 09:49 Kerfuffle @.***> wrote:
Awesome thank you so much. I’m on travel for a few days but will try as soon as I get back to my workstation.
On Tue, Sep 5, 2023, 10:48 Kerfuffle @.***> wrote:
For those of us who use 30b/70B models yes. It very much is impractical to download 40GB over and over again. To download unquantized models it’s also impractical because they are hundreds of GB. If you are downloading 10 - 20 models over time this is virtually impossible due to data caps and internet speeds.