swift-chat: a bit slow on my mbp 16 m1
I downloaded the https://huggingface.co/coreml-projects/Llama-2-7b-chat-coreml model and compiled the chat with xcode. When running the example prompt it takes around 15 minutes to complete. I am not sure what I did wrong, but the performance should be better right ?
2023-08-09 12:01:55.346753+0200 SwiftChat[27414:583595] Metal API Validation Enabled
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 15 (4 by maintainers)
I have a same problem. One thing I don’t understand is I was able to get fast response using
[ollama](https://ollama.ai)
. Any idea why? I can see that the default model used inollama
is the 7b model 🤔Nice, ollama worked for me too right out of the box. I tried to convert llama2 for the swift-chat myself with
python -m exporters.coreml -m=./Llama-2-7b-hf --quantize=float16 --compute_units=cpu_and_gpu ll
but it always crashes without error after around 15 minutes. 🤔Whelp, just closing all other apps, restarting, and running the SwiftChat build without Xcode has resulted in 4.96 tokens/s. Woohoo!