mlx-examples: mlx-lm 0.0.4 - ValueError: Missing parameters: lm_head.biases lm_head.scales
Hi guys.
mlx-ui users are getting this error after bumping from 0.0.3 to 0.0.4, while 0.0.3 continues to work fine, same models.
File "/Users/x/dev/mlx-ui/venv/lib/python3.11/site-packages/mlx_lm/utils.py", line 232, in load
model = load_model(model_path)
^^^^^^^^^^^^^^^^^^^^^^
File "/Users/x/dev/mlx-ui/venv/lib/python3.11/site-packages/mlx_lm/utils.py", line 205, in load_model
model.load_weights(list(weights.items()))
File "/Users/x/dev/mlx-ui/venv/lib/python3.11/site-packages/mlx/nn/layers/base.py", line 151, in load_weights
raise ValueError(f"Missing parameters: {missing}.")
ValueError: Missing parameters: lm_head.biases lm_head.scales.
Is there anything I can do on mlx-ui side?
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 15 (13 by maintainers)
There’s always perf degradation from quantization so that’s unavoidable. I don’t know if the LM head is worse than other layers in that respect 🤷♂️ , if it is then maybe that’s the right call. But then again you do get a nice memory and speed boost by quantizing the LM head since it is the biggest matrix in the model.
I think I would prefer rather than blanket not quantizing LM heads in general (which will break existing models that have quantized LM heads), to special case legacy models if it’s not too complex to do it.
Thanks for the verification!
Thanks a lot! I can confirm the fix. Monkey patched the file for now to test. 😊
Thanks for the quick fix @mzbac, this should be fixed in main now. I will do a 0.0.5 release today @da-z then you can hopefully use that 😄
This is what changed, as you can see we used to check for multiples of 32 (not just the gate size)
Hmm, I see the problem. We updated MLX to quantize non 32 multiples. So now we don’t by default assume those layers are not quantized.
No worries and no rush 😊 (I am using cached requirements.txt from repo now)
I’ve reproduced with 2 models: