llama-cpp-python: "Illegal instruction" when trying to run the server using a precompiled docker image

Expected Behavior

I am trying to execute this:

docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest

and I expect the model to load and server to start. I am using the model quantized by The Bloke according to the current latest specs of llama.ccp ggml implementation

Current Behavior

llama.cpp: loading model from /models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin
Illegal instruction

Environment and Context

Linux DESKTOP-xxx 5.15.68.1-microsoft-standard-WSL2+ #2 SMP

$ python3 3.10.9
$ make GNU Make 4.3
$ g++ (Ubuntu 11.3.0-1ubuntu1~22.04.1) 11.3.0

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

docker run --rm -it -p 8000:8000 -v /home/xxxx/models:/models -e MODEL=/models/gpt4-alpaca-lora_mlp-65B/gpt4-alpaca-lora_mlp-65B.ggmlv3.q5_1.bin ghcr.io/abetlen/llama-cpp-python:latest

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 22

Most upvoted comments

I am running on old E5645 (Westmere) Xeons that do not support AVX at all. I also ran into “Illegal instruction”. But I can confirm that the below command works for me:

CMAKE_ARGS=“-DLLAMA_CUBLAS=1 -DLLAMA_AVX=OFF -DLLAMA_AVX2=OFF -DLLAMA_F16C=OFF -DLLAMA_FMA=OFF” FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir

Not a problem with CMAKE_ARGS.

https://github.com/ggerganov/llama.cpp/issues/1027

@gjmulder and @vmajor Great news I was able to fix this issue. It required me to modify /vendor/llama.cpp/CMakeLists.txt In the lines 56 and 70 I had to turn off LLAMA_AVX2 and turn on LLAMA_CUBLAS

Once that was done I was now able to build “llama-cpp-python” from source and install with pip natively and also import it in python no issues.

(.venv) [ai00@localhost llama-cpp-python]$ python -c "from llama_cpp import *; print(llama_print_system_info())" b'AVX = 1 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 0 | ARM_FMA = 0 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 1 | VSX = 0 | '1

@vmajor I think if you were to do the same modification you should be able to build llama-cpp-python with the correct cpu instructions.

Eitherway, the issue posted on upstream should resolve this issue downstream on “llama-cpp-python” this is a workaround for the moment.