llama.cpp: Cannot convert llama3 8b model to gguf

Please include information about your system, the steps to reproduce the bug, and the version of llama.cpp that you are using. If possible, please provide a minimal code example that reproduces the bug.

I downloaded model from llama using steps provided and I have 14 gb .pth file I try to convert the model using convert.py but it fails giving RuntimeError: Internal: could not parse ModelProto from H:\Downloads\llama3-main\Meta-Llama-3-8B\tokenizer.model but when I added --vocab-type bpe it gives FileNotFoundError: Could not find a tokenizer matching any of ['bpe']

If the bug concerns the server, please try to reproduce it first using the server test scenario framework.

About this issue

Original URL
State: closed
Created 2 months ago
Comments: 18 (2 by maintainers)

Most upvoted comments

@oldmanjk I understand the point perfectly fine. You can figure it out and then add it to the docs. If there aren’t any docs, then create them. It’s a fairly simple thought process. Complaining about it to people who are literally donating their time isn’t productive or helpful. I have nothing else to say on the matter. Best of luck.

teleprint-me on May 2, 2024

@oldmanjk You’re welcome to contribute.

teleprint-me on May 2, 2024

convert.py doesn’t support llama 3 yet. You can use convert-hf-to-gguf.py with llama 3 downloaded from huggingface

Thanks for telling us. I gotta say, it’s getting real annoying wasting endless hours chasing these things down because the devs can’t be bothered to update the relevant info in the main readme (which, BTW, makes no mention of “convert-hf-to-gguf.py” that I’m aware of). Seriously, I can’t be the only one who is infuriated by this pattern of behavior in this community.

Devs: documentation matters. What would take you, what, five minutes to update, would save the community probably hundreds if not thousands of cumulative hours. We appreciate what you do (well, I do, anyway), but this is just dumb and lazy. How many botched ggufs are being proliferated because of this?

oldmanjk on May 2, 2024

convert.py doesn’t support llama 3 yet. You can use convert-hf-to-gguf.py with llama 3 downloaded from huggingface

Galunid on May 1, 2024

Yeah I just saw that lol

ProjectAtlantis-dev on May 5, 2024

convert.py doesn’t support llama 3 yet. You can use convert-hf-to-gguf.py with llama 3 downloaded from huggingface

there’s also recently created convert-hf-to-gguf-update.py but I think you must include your HF access token on the command line or else it will report a bunch of fails presumably when trying to pull from hf. To get an HF access token, you have to log into HF and go to your profile and then Settings … Access Tokens

To recap, if you are reading this, you probably ended up here seeking the llama-bpe stuff in an effort to get rid of the strange error:

llm_load_vocab: missing pre-tokenizer type, using: 'default'
llm_load_vocab:                                             
llm_load_vocab: ************************************        
llm_load_vocab: GENERATION QUALITY WILL BE DEGRADED!        
llm_load_vocab: CONSIDER REGENERATING THE MODEL             
llm_load_vocab: ************************************        
llm_load_vocab:

That means your gguf files will kinda work but the quality is crap compared to the bpe version. Hence, folks are trying to go back to orig safetensors and re-convert because most of the stuff uploaded to hf is sub-par. Note that you must run this whole convert process in a python 3.11 venv because attempting convert in 3.12 just throws errors about distutils. etc. You also need a ton of memory unless you also add the temp file stuff, which appears to be in the current convert hf script but not this update.py thing

There are a bunch of convert scripts and it would be nice if there was an easy way to sort them by last updated in github so it was obvious which are most relevant for llama3.

Most of the llama3 convert saga discussion can be found here https://github.com/ggerganov/llama.cpp/pull/6745

ProjectAtlantis-dev on May 4, 2024

@oldmanjk I understand the point perfectly fine. You can figure it out and then add it to the docs. If there aren’t any docs, then create them. It’s a fairly simple thought process. Complaining about it to people who are literally donating their time isn’t productive or helpful. I have nothing else to say on the matter. Best of luck.

Clearly you don’t understand. Development is a continuous process and things change quickly here. If you want users to keep up with development and keep the documentation updated, you’ve skipped CSci 101, where you would have been taught documentation is one of the most important things for a developer to do well. Since when do users write manuals? You’d basically have to become a dev to be able to do that. I don’t understand how this is so hard to comprehend. You’ve also mistaken constructive criticism for complaining. I’m trying to help you devs understand the user perspective. My tone is intentional to convey the frustration many users feel but are too afraid to voice. If you don’t see how this is immensely helpful, well, I guess I should have expected that. I don’t really care what you think about me. If you want this project to thrive, you need better documentation. Telling the users to create it “isn’t productive or helpful.”

oldmanjk on May 2, 2024