llama-cpp-python: "Illegal instruction" when trying to run the server using a precompiled docker image
Expected Behavior
I am trying to execute this:
docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest
and I expect the model to load and server to start. I am using the model quantized by The Bloke according to the current latest specs of llama.ccp ggml implementation
Current Behavior
llama.cpp: loading model from /models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin
Illegal instruction
Environment and Context
Linux DESKTOP-xxx 5.15.68.1-microsoft-standard-WSL2+ #2 SMP
$ python3 3.10.9
$ make GNU Make 4.3
$ g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 22
I am running on old E5645 (Westmere) Xeons that do not support AVX at all. I also ran into “Illegal instruction”. But I can confirm that the below command works for me:
CMAKE_ARGS=“-DLLAMA_CUBLAS=1 -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF” FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir
Not a problem with CMAKE_ARGS.
https://github.com/ggerganov/llama.cpp/issues/1027
@gjmulder and @vmajor Great news I was able to fix this issue. It required me to modify /vendor/llama.cpp/CMakeLists.txt In the lines 56 and 70 I had to turn off
LLAMA_AVX2
and turn onLLAMA_CUBLAS
Once that was done I was now able to build “llama-cpp-python” from source and install with pip natively and also import it in python no issues.
(.venv) [ai00@localhost llama-cpp-python]$ python -c "from llama_cpp import *; print(llama_print_system_info())" b'AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | '1
@vmajor I think if you were to do the same modification you should be able to build llama-cpp-python with the correct cpu instructions.
Eitherway, the issue posted on upstream should resolve this issue downstream on “llama-cpp-python” this is a workaround for the moment.