mlx-examples: mistral example seems to hang on Loading model from disk.

I am trying to run the mistral model 7B following the README’s instructions (and adding the missing numpy dep)

This is all I see for a very long time

> python mistral.py --prompt "It is a truth universally acknowledged,"  --temp 0
[INFO] Loading model from disk.

Hardware: Apple M1 Max OS: 14.1.2 (23B92) Python 3.11.5 conda 23.7.4

My python process stays at a really low CPU and I am not seeing any disk access that is worthy of fiddling with 14GB of data.

Any way to better debug this or get to something working?

About this issue

  • Original URL
  • State: open
  • Created 7 months ago
  • Comments: 19 (6 by maintainers)

Most upvoted comments

I’m experience kinda same, after following README it hangs with no output for more than 10 minutes but for me it has additional row:

python3 mistral.py --prompt "It is a truth universally acknowledged,"  --temp 0
[INFO] Loading model from disk.
[INFO] Starting generation...
It is a truth universally acknowledged,

After that no significal CPU/GPU usage and i have to kill process.

Enviroinment info: M1 Pro 16Gb Sonoma 14.1.1 (23B81) Python 3.9.5

Also strange that pip cannot find mlx package for python3.10 3.11 and 3.12 despite PyPi requires python 3.7+

@Blucknote There are pre-converted quantized models in the MLX Hugging Face community: https://huggingface.co/mlx-community

Also, all of the conversion scripts in the LLM examples can produce quantized models

New version worked (sort of)!

The new release properly gave me:

libc++abi: terminating due to uncaught exception of type pybind11::error_already_set: BadZipFile: Bad CRC-32 for file ‘layers.24.feed_forward.w3.weight.npy’

Instead of silent failure and infinite hang, which was really nice.

I re-downloaded the mistral weights and performed the steps again, and THEN it worked flawlessly!

Thanks for looking at this.