llama.cpp: [User] GGUF conversion, stop sequence Problem

Hi ❤️ llama.cpp

@KerfuffleV2 shows us that models converted without metadata load different: Loading non-metadata:

llama_model_load_internal: BOS token = 1 ' '
llama_model_load_internal: EOS token = 2 ' '

Loading with one converted with external metadata:

llama_model_load_internal: BOS token = 1 '<s>'
llama_model_load_internal: EOS token = 2 '</s>'

I converted WizardMath-7B-V1.0 to GGUF and here’s a couple runs: ex1:

~/l/b/bin (master) [SIGINT]> ./main -m ~/wizardmath-7b-v1.0.ggmlv3.q4_0.gguf --color -c 2048 --keep -1 -n -1 -t 3 -b 7 -i -r "User:" --in-prefix " " --in-suffix "Assistant:" -f ~/storage/shared/PT/M.txt

main: build = 1015 (226255b)
main: seed  = 1692706079
llama_model_loader: loaded meta data with 15 key-value pairs and 291 tensors from /data/data/com.termux/files/home/wizardmath-7b-v1.0.ggmlv3.q4_0.gguf (version GGUF V1 (latest))
..
llama_model_load_internal: format       = GGUF V1 (latest) llama_model_load_internal: arch         = llama
llama_model_load_internal: vocab type   = SPM              llama_model_load_internal: n_vocab      = 32001
llama_model_load_internal: n_ctx_train  = 2048             llama_model_load_internal: n_ctx        = 2048
llama_model_load_internal: n_embd       = 4096             llama_model_load_internal: n_head       = 32
llama_model_load_internal: n_head_kv    = 32               llama_model_load_internal: n_layer      = 32
llama_model_load_internal: n_rot        = 128              llama_model_load_internal: n_gqa        = 1
llama_model_load_internal: f_norm_eps   = 5.0e-06          llama_model_load_internal: n_ff         = 11008
llama_model_load_internal: freq_base    = 10000.0          llama_model_load_internal: freq_scale   = 1
llama_model_load_internal: model type   = 7B               llama_model_load_internal: model ftype  = mostly Q4_0
llama_model_load_internal: model size   = 6.74 B           llama_model_load_internal: general.name = wizardmath-7b-v1.0.ggmlv3.q4_0.bin                        
llama_model_load_internal: BOS token = 1 ''
llama_model_load_internal: EOS token = 2 ''                llama_model_load_internal: LF token  = 13 '<0x0A>'
llama_model_load_internal: ggml ctx size =    0.09 MB      llama_model_load_internal: mem required  = 3615.73 MB (+ 1024.00 MB per state)                                       
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size =    3.49 MB
                                                           
system_info: n_threads = 3 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |                          

main: interactive mode on.
Reverse prompt: 'User'                                     
Input prefix: ' '
Input suffix: 'Assistant:'                                 
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 42
                                                           
== Running in interactive mode. ==                          
- Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.                 
 - To return control without starting a new line, end your input with '/'.                                             
- If you want to submit another line, end your input with '\'.                                                       

Below is an instruction that describes a task. Write a response that appropriately completes the request.
                                                           
### Instruction:
Please list 3 movie titles. 
                               
### Response: We are asked to list 3 movie titles, which means we need to come up with at least 3 different movie titles. Here is a list of 3 movie titles for your reference:   
1. Titanic (1997)                                        
2. The Matrix (1999)
3. Toy Story (1995)                                        

These are just some examples, and there are certainly many more movie titles out there. However, these 3 movies have been well-known and popular for a long time, and they represent different genres and styles of filmmaking. Therefore, I believe that these 3 movie titles will not disappoint you.
The answer is: Here are three movie titles: Titanic (1997), The Matrix (1999), and Toy Story (1995).
                                                           
</s>
                                                         
The answer is: Three movie titles are: Titanic (1997), The Matrix (1999), and Toy Story (1995)..                      
</s>

ex2:

### Instruction:
Please list 3 movie titles.

### Response:I'm not sure what you're looking for, but here are some movie titles:

1. The Shawshank Redemption
2. Schindler's List
3. The Godfather

The answer is: Here are three movie titles:
1. The Shawshank Redemption
2. Schindler's List
3. The Godfather.

</s>

The answer is: Here are three movie titles:
1. The Shawshank Redemption
2. Schindler's List
3. The Godfather.

</s>

It appears due to the way the model is converted it’s unable to utilise the stop sequence, thus doesn’t return control to the User in this case.

Edit: Error message trying to include metadata:

python3 convert-llama-ggmlv3-to-gguf.py -i ~/wizardmath-7b-v1.0.ggmlv3.q4_0.bin -o ~/wizardM2.gguf -c 2048 -m ~/storage/shared/downloads/wizardmath             
                                  
* Using config: Namespace(input=PosixPath('/data/data/com.termux/files/home/wizardmath-7b-v1.0.ggmlv3.q4_0.bin'), output=PosixPath('/data/data/com.termux/files/home/wizardM2.gguf'), name=None, desc=None, gqa=1, eps='5.0e-06', context_length=2048, model_metadata_dir=PosixPath('/data/data/com.termux/files/home/storage/shared/downloads/wizardmath'), vocab_dir=None, vocabtype='spm')
                                                          
 === WARNING === Be aware that this conversion script is best-effort. Use a native GGUF model if possible. === WARNING ===
                                                          
* Scanning GGML input file
* GGML model hyperparameters: <Hyperparameters: n_vocab=32001, n_embd=4096, n_mult=5504, n_head=32, n_layer=32, n_rot=128, n_ff=11008, ftype=2>   

Traceback (most recent call last): File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 333, in <module>
    main()                                  
    
File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 323, in main(params_override, vocab_override) = handle_metadata(cfg, model.hyperparameters)                                                     
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^         

File "/data/data/com.termux/files/home/llama.cpp/convert-llama-ggmlv3-to-gguf.py", line 274, in handle_metadata import convert File "/data/data/com.termux/files/home/llama.cpp/convert.py", line 27, in <module> from sentencepiece import SentencePieceProcessor  # type: ignore 
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ModuleNotFoundError: No module named 'sentencepiece'

Repo & here’s the content of ~/storage/shared/downloads/wizardmath: Screenshot_20230822_100229

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 19

Most upvoted comments

Oddly, wizardMath doesn’t have tokenizer.model, so I want to convert wizardlm-7b-v1.0-uncensored.ggmlv3.q4_0.bin

Your link there does have the expected files, including tokenizer.model. Something went wrong with your download, though. The link has config.json, the directory you listed has config.txt. Did you save the .json files as .txt or something?

Yeah, somehow I messed up. I corrected the formats, thanks. It converted!

~/l/b/bin (master)> ./main -m ~/wizardLM.gguf --color -c 2048 --keep -1 -n -1 -t 3 -b 7 -i -r "User" --in-prefix " " --in-suffix "Assistant:" -f ~/storage/shared/PT/M.txt

main: build = 1015 (226255b)
main: seed  = 1692717766
llama_model_loader: loaded meta data with 15 key-value pairs and 291 tensors from /data/data/com.termux/files/home/wizardLM.gguf (version GGUF V1 (latest))
...
llama_model_load_internal: BOS token = 1 '<s>'
llama_model_load_internal: EOS token = 2 '</s>'
llama_model_load_internal: LF token  = 13 '<0x0A>'
llama_model_load_internal: ggml ctx size =    0.09 MB
llama_model_load_internal: mem required  = 3615.73 MB (+ 1024.00 MB per state)
llama_new_context_with_model: kv self size  = 1024.00 MB
llama_new_context_with_model: compute buffer total size =    3.49 MB

system_info: n_threads = 3 / 8 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 |
main: interactive mode on.
Reverse prompt: 'User'
Input prefix: ' '
Input suffix: 'Assistant:'
sampling: repeat_last_n = 64, repeat_penalty = 1.100000, presence_penalty = 0.000000, frequency_penalty = 0.000000, top_k = 40, tfs_z = 1.000000, top_p = 0.950000, typical_p = 1.000000, temp = 0.800000, mirostat = 0, mirostat_lr = 0.100000, mirostat_ent = 5.000000
generate: n_ctx = 2048, n_batch = 7, n_predict = -1, n_keep = 42


== Running in interactive mode. ==
 - Press Ctrl+C to interject at any time.
 - Press Return to return control to LLaMa.
 - To return control without starting a new line, end your input with '/'.
 - If you want to submit another line, end your input with '\'.

 Below is an instruction that describes a task. Write a response that appropriately completes the request.

### Instruction:
Please list 3 movie titles.

### Response:
 Sure, here are three movie titles for you:
1. The Shawshank Redemption
2. Schindler's List
3. Pulp Fiction
 Thank you.
Assistant: You're welcome! I hope you enjoy watching these movies.
 

Even better, it stopped as expected, so converting with metadata definitely works.

llama_model_load_internal: BOS token = 1 ' ' it knows 1 = BOS, right?

Yes correct. The initial problem was the string values of the 0,1,2 tokens. Use as default the original LLaMA mapping:

id 0 = <unk>
id 1 = <s>
id 2 = </s>

ModuleNotFoundError: No module named ‘sentencepiece’

I think you need to install the python tokenizer: pip install sentencepiece