llama-cpp-python: LLama cpp problem ( gpu support)
Hello, I am completly newbie, when it comes to the subject of llms I install some ggml model to oogabooga webui And I try to use it. It works fine, but only for RAM. For VRAM only uses 0.5gb, and I don’t have any possibility to change it (offload some layers to GPU), even pasting in webui line “–n-gpu-layers 10” dont work. So I stareted searching, one of answers is command:
pip uninstall -y llama-cpp-python
set CMAKE_ARGS="-DLLAMA_CUBLAS=on"
set FORCE_CMAKE=1
pip install llama-cpp-python --no-cache-dir
But that dont work for me. I got after paste it:
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for llama-cpp-python
Failed to build llama-cpp-python
ERROR: Could not build wheels for llama-cpp-python, which is required to install pyproject.toml-based projects
And it completly broke llama folder… It uninstall it, and did nothing more. I need to update webui to fix and download llama.cpp again, cause I don’t have any other possibility to download it.
I try also downloading compilation method, but that did.t work also. When i paste CMAKE_ARGS=“-DLLAMA_OPENBLAS=on” FORCE_CMAKE=1 pip install llama-cpp-python in CMD/ CMD Windows in oogabooga, a I always got this message:
'CMAKE_ARGS' is not recognized as an internal or external command,
operable program or batch file.
or
'FORCE_CMAKE' is not recognized as an internal or external command,
operable program or batch file.
Same for command “make” it unrecognised it despite I have istalled make and Cmake
also, when i lanuch webui and choose ggml model, I got something like this in console:
lama model load internal: format ggjt v3 (latest)
lama model load internal: n_voc = 32001
lama model load internal: n_ctx = 2048
lama model load internal: n_embd = 6656
lama model load internal: n mult = 256
lama model load internal: n head = 52
lama model load internal: n_layer = 60
lama model load internal: n_rot = 128
lama model load internal: freq_base = 10000.0
lama model load internal: freq_scale = 1
lama model load internal: ftype = 2 (mostly Q4_0)
lama model load internal: n_ff = 17920
lama model load internal: model size = 30B
lama model_load internal: ggml ctx size = 0.14 MB
lama_model_load internal: mem required = 19712.68 MB 1+ 3124.00 MB per state)
lama_new_context with model: kv self size = 3120.00 MB
AVX=1 | AVX2=1 | AVX512=0 | AVX512_VBMI=0 | AVX512_VNNII=0 | FMA=1 | NEON=0 | ARM_FMA=0 | F16C=1 | FP16_VA=0 | - a WASM_SIMD=0 | BLAS=0 | SSE3=1 1 | VSX=0 |
2023.07.19 23:05:22 INFO:Loaded the model in 8.17 Seconds.
I am using windows and nvidia card
Easy solution to enable GPU offlading layers, that dont reqiure installing a ton of stuffs?
About this issue
- Original URL
- State: open
- Created a year ago
- Reactions: 1
- Comments: 20
To build the
libllama.so
with gpu support you need to have CUDA SDK installed, then:Then note that the
g++
compiler will add the-DGGML_USE_CUBLAS
compiler flag. and it will create a file calledlibllama.so
in the current directory. check it withAfter that you can force
llama-cpp-python
to use that lib with:After that, it worked with GPU support here. Of course you have to init your model with something like
Hope it helps.
Thanks for your kind response , i used your advice ,
and got it working by reinstalling llama-cpp-python with these variables CMAKE_ARGS=“-DLLAMA_CUBLAS=on -DCUDA_PATH=/usr/local/cuda-12.2 -DCUDAToolkit_ROOT=/usr/local/cuda-12.2 -DCUDAToolkit_INCLUDE_DIR=/usr/local/cuda-12/include -DCUDAToolkit_LIBRARY_DIR=/usr/local/cuda-12.2/lib64 -DCMAKE_CUDA_COMPILER:PATH=/usr/local/cuda/bin/nvcc” FORCE_CMAKE=1 pip install --upgrade --force-reinstall llama-cpp-python --no-cache-dir --verbose
Thanks @glaudiston ,It’s working perfect
Thanks @glaudiston !!!
Well I just wanted to run llama-cpp-python from miniconda3 env from https://github.com/oobabooga/text-generation-webui
In that case you can only use
export LLAMA_CPP_LIB=/yourminicondapath/miniconda3/lib/python3.10/site-packages/llama_cpp_cuda/libllama.so
Before running your jupyter-notebook, ipython or python or whatever. In my case I added to my .bashrc.Voilà!!!!
On importing
from llama_cpp import Llama
I getAnd on
Thanks @glaudiston . The llama.cpp lib works absolutely fine with my GPU, so it’s odd that the python binding is failing.
I was able to make it work using
LLAMA_CPP_LIB
pointing to alibllama.so
file compiled withGGML_USE_CUBLAS
.This method worked for me.
First, install using:
Then install the nvidia cuda toolkit again if it shows errrors related to cuda:
You can use WSL2 on Windows, and it should work as if you were using Linux.
This is probably due a dirty build. That symbol is generated only when building with GPU support. Try a
Also make sure
nvcc
is in your path, by setting${CUDA_HOME}
in yourPATH
environment variable and try again. And try again.