swift-chat: a bit slow on my mbp 16 m1

I downloaded the https://huggingface.co/coreml-projects/Llama-2-7b-chat-coreml model and compiled the chat with xcode. When running the example prompt it takes around 15 minutes to complete. I am not sure what I did wrong, but the performance should be better right ? 2023-08-09 12:01:55.346753+0200 SwiftChat[27414:583595] Metal API Validation Enabled

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 15 (4 by maintainers)

Most upvoted comments

I have a same problem. One thing I don’t understand is I was able to get fast response using [ollama](https://ollama.ai). Any idea why? I can see that the default model used in ollama is the 7b model 🤔

Nice, ollama worked for me too right out of the box. I tried to convert llama2 for the swift-chat myself with python -m exporters.coreml -m=./Llama-2-7b-hf --quantize=float16 --compute_units=cpu_and_gpu ll but it always crashes without error after around 15 minutes. 🤔

Whelp, just closing all other apps, restarting, and running the SwiftChat build without Xcode has resulted in 4.96 tokens/s. Woohoo!