mlx-examples: mistral example seems to hang on Loading model from disk.
I am trying to run the mistral model 7B following the README’s instructions (and adding the missing numpy dep)
This is all I see for a very long time
> python mistral.py --prompt "It is a truth universally acknowledged," --temp 0
[INFO] Loading model from disk.
Hardware: Apple M1 Max OS: 14.1.2 (23B92) Python 3.11.5 conda 23.7.4
My python process stays at a really low CPU and I am not seeing any disk access that is worthy of fiddling with 14GB of data.
Any way to better debug this or get to something working?
About this issue
- Original URL
- State: open
- Created 7 months ago
- Comments: 19 (6 by maintainers)
I’m experience kinda same, after following README it hangs with no output for more than 10 minutes but for me it has additional row:
After that no significal CPU/GPU usage and i have to kill process.
Enviroinment info: M1 Pro 16Gb Sonoma 14.1.1 (23B81) Python 3.9.5
Also strange that pip cannot find mlx package for python3.10 3.11 and 3.12 despite PyPi requires python 3.7+
@Blucknote There are pre-converted quantized models in the MLX Hugging Face community: https://huggingface.co/mlx-community
Also, all of the conversion scripts in the LLM examples can produce quantized models
New version worked (sort of)!
The new release properly gave me:
libc++abi: terminating due to uncaught exception of type pybind11::error_already_set: BadZipFile: Bad CRC-32 for file ‘layers.24.feed_forward.w3.weight.npy’
Instead of silent failure and infinite hang, which was really nice.
I re-downloaded the mistral weights and performed the steps again, and THEN it worked flawlessly!
Thanks for looking at this.