private-gpt: Segmentation Fault on Intel Mac
I got a segmentation fault running the basic setup in the documentation. This may be an obvious issue I have simply overlooked but I am guessing if I have run into it, others will as well.
llama_new_context_with_model: n_ctx = 3900 llama_new_context_with_model: freq_base = 10000.0 llama_new_context_with_model: freq_scale = 1 llama_new_context_with_model: kv self size = 487.50 MB llama_build_graph: non-view tensors processed: 740/740 ggml_metal_init: allocating ggml_metal_init: found device: Intel(R) UHD Graphics 630 ggml_metal_init: found device: AMD Radeon Pro 5500M ggml_metal_init: picking default device: AMD Radeon Pro 5500M ggml_metal_init: default.metallib not found, loading from source make: *** [run] Segmentation fault: 11
About this issue
- Original URL
- State: closed
- Created 8 months ago
- Comments: 19 (5 by maintainers)
Hey guys - sorry for the late reply, I was really busy at work lately
Given the amount of comment, I will a single and grouped answer, hoping to help most/all of you guys (Apple Folks ✌️)
Help
While we are doing our best to help you during your
privateGPT
installation, we are not maintaining these libraries. Please find below instructions to try and fix your existing setup. However, please know that there might be cleaner solution already available for you inllama.cpp
andllama-cpp-python
. To see their help, please refer yourself to the link in the sectionAdditional Help
, at the bottom of this message.How to know in which “acceleration mode” am I running
By acceleration mode, I mean “where library is being used to do the computation”: Metal? *BLAS? etc.
By looking at the logs returned by
llama.cpp
(the lines that does not start by<time> [INFO ]
(these are python logs);llama.cpp
logs starts withllm_load
,llama_model_loader
, etc) you can know what your installation is trying to use. Thellama.cpp
logs will look like the following:For example, if your installation is configured to use Metal, you will see
ggml_metal_init
.How to disable Metal
Memory requirements
By default,
privateGPT
will try to put all the neural network layers computation in GPU (i.e. will load all the layers in GPU memory *). For Apple Chip (M1, M2, M3, etc), given that the memory on these system are unified memory, that means that this the “normal” RAM that is being used (if I’m not mistaken). For Apple Computer with Intel chips (and other GPUs), that means that the model will try to be loaded in your graphics card memory (often called VRAM)!That means that, if your GPU only have 500MB (or 1.5GB) and you are trying to run a model that is of size 4GB, it will not work (and will return
segementation fault 11
etc, because you are trying to allocate more than you have)Solutions to run
privateGPT
The following solutions are possible. You can try them one by one, and see which one suits you.
CPU Only - disable Metal
Do not change the
privateGPT
code, and change instead the configuration of libraries it is usingCPU Only - change how
privateGPT
load the modelChange this line, and make it
model_kwargs={"n_gpu_layers": 0},
to disable the load of the model in GPU. You can also try to put some values, such as50
or100
(to still offload some layers to the GPU, but not all of them, because-1
means all layers)Run smaller models
If you are still seeing segmentation fault (trying to allocate more memory than you have), you can try to reduce the size of the model you are running (at the cost of having answers that are less neat). For example, you can pick the smallest model of Mistral (all of them are available on this page: https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.1-GGUF), which is currently
mistral-7b-instruct-v0.1.Q2_K.gguf
.Modify your settings to specify this model:
Try with other versions of
llama-cpp-python
Given that during the re-installation of
llama-cpp-python
we are not specifying a specific version, you might try to compile a version ofllama.cpp
that has a bug not yet fix. You can try to install fixed (and older) version ofllama-cpp-python
by replacingllama-cpp-python
byllama-cpp-python==X.Y.Z
, whereX.Y.Z
is a version number in https://github.com/abetlen/llama-cpp-python/releases.Example:
Additional Help
You can find additional help directly on the libraries that
privateGPT
is using:llama-cpp-python
: https://github.com/abetlen/llama-cpp-pythonllama.cpp
- this is doing the compilation on your hostllama.cpp
: https://github.com/ggerganov/llama.cppAdditional tips
Python Wheel compilation (
pip install
) in verbose modeAdd
-vvv
to yourpip
command, this will display the logs at compilation ofllama.cpp
, showing you the framework that have been used to compile your lib.Force CMAKE
While doing some reading to write this comment, I found that some articles are recommending to force CMake usage by setting this environment variable
FORCE_CMAKE=1
Hacky Fix if Installed with Conda
I was able to fix this by:
Overall, it seems that there is an issue with finding the correct path to ggml-metal.metal if installed normally.
Please compare the VRAM of your GPU, and the VRAM asked by the model.
Also, I found that the
llama-cpp-python
(i.e.llama.cpp
) version thatprivateGPT
is using is not working well inMETAL
mode on Apple device that does not haveMx
chips (i.e. it does not run well if you have Apple devices running on Intel).You can try to run using BLAS variants instead of Metal
More information in the README of https://github.com/abetlen/llama-cpp-python