mlx-examples: mlx-lm 0.0.4 - ValueError: Missing parameters: lm_head.biases lm_head.scales

Hi guys.

mlx-ui users are getting this error after bumping from 0.0.3 to 0.0.4, while 0.0.3 continues to work fine, same models.

File "/Users/x/dev/mlx-ui/venv/lib/python3.11/site-packages/mlx_lm/utils.py", line 232, in load
    model = load_model(model_path)
            ^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/x/dev/mlx-ui/venv/lib/python3.11/site-packages/mlx_lm/utils.py", line 205, in load_model
    model.load_weights(list(weights.items()))
  File "/Users/x/dev/mlx-ui/venv/lib/python3.11/site-packages/mlx/nn/layers/base.py", line 151, in load_weights
    raise ValueError(f"Missing parameters: {missing}.")
ValueError: Missing parameters: lm_head.biases lm_head.scales.

Is there anything I can do on mlx-ui side?

About this issue

Original URL
State: closed
Created 5 months ago
Comments: 15 (13 by maintainers)

Most upvoted comments

I think the quick workaround is to not quantize lm_head. I also noticed that quantizing and de-quantizing lm_head could cause some performance issues. @awni what do you think?

There’s always perf degradation from quantization so that’s unavoidable. I don’t know if the LM head is worse than other layers in that respect 🤷‍♂️ , if it is then maybe that’s the right call. But then again you do get a nice memory and speed boost by quantizing the LM head since it is the biggest matrix in the model.

I think I would prefer rather than blanket not quantizing LM heads in general (which will break existing models that have quantized LM heads), to special case legacy models if it’s not too complex to do it.

awni on Jan 24, 2024

Thanks for the verification!

awni on Jan 24, 2024

Thanks a lot! I can confirm the fix. Monkey patched the file for now to test. 😊

da-z on Jan 24, 2024

Thanks for the quick fix @mzbac, this should be fixed in main now. I will do a 0.0.5 release today @da-z then you can hopefully use that 😄

awni on Jan 24, 2024

This is what changed, as you can see we used to check for multiples of 32 (not just the gate size)

awni on Jan 24, 2024

Hmm, I see the problem. We updated MLX to quantize non 32 multiples. So now we don’t by default assume those layers are not quantized.

awni on Jan 24, 2024

No worries and no rush 😊 (I am using cached requirements.txt from repo now)

I’ve reproduced with 2 models:

mlx-community/dolphin-2.6-mistral-7b-dpo-laser-4bit-mlx
mlx-community/Nous-Hermes-2-Mixtral-8x7B-DPO-4bit

da-z on Jan 24, 2024