llama.cpp: CLBlast fails on context lengths above 2048 after merging #4256
Inference with CLBlast fails with a segfault after the commit that merged https://github.com/ggerganov/llama.cpp/pull/4256 on context sizes above 2k when all GPU layers are offloaded.
Command line:
C:\test\llama-b1601-bin-win-clblast-x64>main.exe -m E:\LLaMA\models\airoboros-mistral2.2-7b.Q4_K_S.gguf -c 4096 -b 512 -n 32 -ngl 33 -f C:\test\test.txt
main: build = 1601 (5a7d312)
main: built with MSVC 19.37.32826.1 for x64
main: seed = 1701534899
ggml_opencl: selecting platform: 'NVIDIA CUDA'
ggml_opencl: selecting device: 'NVIDIA GeForce RTX 2060'
ggml_opencl: device FP16 support: false
Result: Prompt processing starts, and then segfaults halfway around the 2k token mark, before generation begins. Only if the prompt is short enough (less than 2k tokens) it appears to work.
About this issue
- Original URL
- State: closed
- Created 7 months ago
- Reactions: 1
- Comments: 15 (10 by maintainers)
Commits related to this issue
- fixed segfault with clblast by reversing commit in issue https://github.com/ggerganov/llama.cpp/issues/4296 — committed to LostRuins/koboldcpp by LostRuins 7 months ago
No problem - thank you very much for reporting this issue
Sorry I couldn’t help more with the debugging. Anyway https://github.com/ggerganov/llama.cpp/pull/4307 seems to work for me. The segfault no longer occurs.
Please confirm that #4307 works
@AlpinDale When running with ASAN, you need to add this env variable:
ASAN_OPTIONS=protect_shadow_gap=0 ./main ..
to go through these bogus errors on init.Doing that, I now get the following sanitizer errors, confirming a bug in
ggml.c
that I introduced in #4256I’m able to reproduce - looking into it
Can confirm this happens for me too. Same command and prompt as @LostRuins. Hardware is RTX 2070S and Intel i7-8700, and I’m using Linux 6.5.9. Happens with
-ngl 0
and-ngl 99
. The error I get is:Different error followed by a segfault with
-ngl 32
(7B GGUF model):