llama.cpp: nvcc fatal : Value 'native' is not defined for option 'gpu-architecture' when compiling cuBLAS
There seems to be an issue on the Makefile that’s assigning an invalid nvcc flag for gpu-architecture. I’m running on Ubuntu22.04 on a RTX4090, starting from the Docker image nvidia/cuda:12.3.1-runtime-ubuntu22.04.
Running make creates the following error:
root@ZEPPELIN-01:/workspace/llama.cpp# make LLAMA_CUBLAS=1
expr: syntax error: unexpected argument '070100'
expr: syntax error: unexpected argument '080100'
I llama.cpp build info:
I UNAME_S: Linux
I UNAME_P: x86_64
I UNAME_M: x86_64
I CFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include -std=c11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wshadow -Wstrict-prototypes -Wpointer-arith -Wmissing-prototypes -Werror=implicit-int -Werror=implicit-function-declaration -pthread -march=native -mtune=native -Wdouble-promotion
I CXXFLAGS: -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -march=native -mtune=native -Wno-array-bounds -Wno-format-truncation -Wextra-semi
I NVCCFLAGS: -use_fast_math --forward-unknown-to-host-compiler -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128
I LDFLAGS: -lcuda -lcublas -lculibos -lcudart -lcublasLt -lpthread -ldl -lrt -L/usr/local/cuda/lib64 -L/opt/cuda/lib64 -L/targets/x86_64-linux/lib -L/usr/local/cuda/targets/aarch64-linux/lib -L/usr/lib/wsl/lib
I CC: cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
I CXX: g++ (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
nvcc -I. -Icommon -D_XOPEN_SOURCE=600 -D_GNU_SOURCE -DNDEBUG -DGGML_USE_CUBLAS -I/usr/local/cuda/include -I/opt/cuda/include -I/targets/x86_64-linux/include -I/usr/local/cuda/targets/aarch64-linux/include -std=c++11 -fPIC -O3 -Wall -Wextra -Wpedantic -Wcast-qual -Wno-unused-function -Wmissing-declarations -Wmissing-noreturn -pthread -use_fast_math --forward-unknown-to-host-compiler -arch=native -DGGML_CUDA_DMMV_X=32 -DGGML_CUDA_MMV_Y=1 -DK_QUANTS_PER_ITERATION=2 -DGGML_CUDA_PEER_MAX_BATCH_SIZE=128 -Wno-pedantic -Xcompiler "-Wno-array-bounds" -c ggml-cuda.cu -o ggml-cuda.o
nvcc fatal : Value 'native' is not defined for option 'gpu-architecture'
make: *** [Makefile:429: ggml-cuda.o] Error 1
The same issue seems to happen in whisper.cpp, by the way – I also created an issue report there.
The issue seems to be resolved by simply changing the flag on the Makefile from ‘native’ to ‘all’. However, I still cannot make llama.cpp to compile with the GPU for some reason.
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 15
You are right, thank you, it worked after adding the CUDA_DOCKER_ARCH=all
When it was running I noticed a message saying that CUBLAS was depricated and people should switch to LLAMA_CUDA in the future…
Many thanks
If that doesn’t work, try
make LLAMA_CUBLAS=1 CUDA_DOCKER_ARCH=all(it seems from the install readme that the CuBLAS flag changed from LLAMA_CUBLAS to LLAMA_CUDA, so that’s why I sent the first message)
Are you compiling it with make @hiddengerbil? Try
make LLAMA_CUDA=1 CUDA_DOCKER_ARCH=allTry this: Clone the repository: git clone https://github.com/ggerganov/llama.cpp.git Build Lllama.cpp
cd llama.cpp sed -i ‘s/-arch=native/-arch=all/g’ Makefile make clean && LLAMA_CUBLAS=1 make -j
allmeans it’d build for all architectures that your CUDA version supports, which would take a long time and not be a good default. That docker issue is know, you can set the value with the environment variableCUDA_DOCKER_ARCH. For a 4090 the correct value would becompute_89.