slowllama: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
In order to merge LoRA checkpoint for llama 2 7B model, I run python merge_lora.py.
But an error occured,
Traceback (most recent call last):
File "/Users/xxx/llama/slowllama/merge_lora.py", line 14, in <module>
add_lora(model_path, lora_path, out_model_path)
File "/Users/xxx/llama/slowllama/loader.py", line 188, in add_lora
lora = lora_weights[b_key].mm(lora_weights[a_key]) * lora_scale
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'
So I modified the code like below and I got the merged model file.
lora = lora_weights[b_key].to(torch.float32).mm(lora_weights[a_key].to(torch.float32)) * lora_scale
But I wonder it’s okay or not. Can you give the opinion or right solution?
About this issue
- Original URL
- State: open
- Created 4 months ago
- Comments: 15 (8 by maintainers)
Got it. I think I’ll need to try it myself to double-check (we transform weights fp16->fp32->bf16), but if the merged model produces reasonable output it should be ok.