llama.cpp: GPU Inference error - Cannot find lcublasLt


`(venv) azure@gpu:/largedisk/llama.cpp$ make LLAMA_CUBLAS=1
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -O3 -std=c11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wdouble-promotion -Wshadow -Wstrict-prototypes -Wpointer-arith -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I CXXFLAGS: -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include
I LDFLAGS: -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
I CC: cc (Ubuntu 9.4.0-1ubuntu120.04.1) 9.4.0
I CXX: g++ (Ubuntu 9.4.0-1ubuntu120.04.1) 9.4.0

g++ -I. -I./examples -O3 -std=c++11 -fPIC -DNDEBUG -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wno-multichar -pthread -march=native -mtune=native -DGGML_USE_K_QUANTS -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include examples/quantize/quantize.cpp ggml.o llama.o k_quants.o ggml-cuda.o -o quantize -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib
**/usr/bin/ld: cannot find -lcublas
/usr/bin/ld: cannot find -lculibos
/usr/bin/ld: cannot find -lcudart
/usr/bin/ld: cannot find -lcublasLt**
collect2: error: ld returned 1 exit status
make: *** [Makefile:313: quantize] Error 1`

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 17 (5 by maintainers)

Most upvoted comments

The following steps solved it for me (Thanks @SlyEcho for the CMake suggestion. It was failing before but a fresh installed helped). There was no need for adding paths or any other changes.

These steps only indicate that you are using OPENBLAS and not CUBLAS, which is what you are looking for.

Sorry, copy paste error. I have edited for CUBLAS.