mlx-examples: mistral example seems to hang on Loading model from disk.

I am trying to run the mistral model 7B following the README’s instructions (and adding the missing numpy dep)

This is all I see for a very long time

> python mistral.py --prompt "It is a truth universally acknowledged,"  --temp 0
[INFO] Loading model from disk.

Hardware: Apple M1 Max OS: 14.1.2 (23B92) Python 3.11.5 conda 23.7.4

My python process stays at a really low CPU and I am not seeing any disk access that is worthy of fiddling with 14GB of data.

Any way to better debug this or get to something working?

About this issue

Original URL
State: open
Created 7 months ago
Comments: 19 (6 by maintainers)

Most upvoted comments

I’m experience kinda same, after following README it hangs with no output for more than 10 minutes but for me it has additional row:

python3 mistral.py --prompt "It is a truth universally acknowledged,"  --temp 0
[INFO] Loading model from disk.
[INFO] Starting generation...
It is a truth universally acknowledged,

After that no significal CPU/GPU usage and i have to kill process.

Enviroinment info: M1 Pro 16Gb Sonoma 14.1.1 (23B81) Python 3.9.5

Also strange that pip cannot find mlx package for python3.10 3.11 and 3.12 despite PyPi requires python 3.7+

Blucknote on Dec 7, 2023

@Blucknote There are pre-converted quantized models in the MLX Hugging Face community: https://huggingface.co/mlx-community

Also, all of the conversion scripts in the LLM examples can produce quantized models

awni on Dec 30, 2023

New version worked (sort of)!

The new release properly gave me:

libc++abi: terminating due to uncaught exception of type pybind11::error_already_set: BadZipFile: Bad CRC-32 for file ‘layers.24.feed_forward.w3.weight.npy’

Instead of silent failure and infinite hang, which was really nice.

I re-downloaded the mistral weights and performed the steps again, and THEN it worked flawlessly!

Thanks for looking at this.

Diniden on Dec 7, 2023