slowllama: RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

In order to merge LoRA checkpoint for llama 2 7B model, I run python merge_lora.py.

But an error occured,

Traceback (most recent call last):
  File "/Users/xxx/llama/slowllama/merge_lora.py", line 14, in <module>
    add_lora(model_path, lora_path, out_model_path)
  File "/Users/xxx/llama/slowllama/loader.py", line 188, in add_lora
    lora = lora_weights[b_key].mm(lora_weights[a_key]) * lora_scale
RuntimeError: "addmm_impl_cpu_" not implemented for 'Half'

So I modified the code like below and I got the merged model file.

lora = lora_weights[b_key].to(torch.float32).mm(lora_weights[a_key].to(torch.float32)) * lora_scale

But I wonder it’s okay or not. Can you give the opinion or right solution?

About this issue

  • Original URL
  • State: open
  • Created 4 months ago
  • Comments: 15 (8 by maintainers)

Most upvoted comments

Got it. I think I’ll need to try it myself to double-check (we transform weights fp16->fp32->bf16), but if the merged model produces reasonable output it should be ok.