llama.cpp: Can't compile "llama.cpp/ggml-quants.c"

Prerequisites

Please answer the following questions for yourself before submitting an issue.

I am running the latest code. Development is very rapid so there are no tagged versions as of now.
I carefully followed the README.md.
I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
I reviewed the Discussions, and have a new bug or useful enhancement to share.

Current Behavior

While attempting to compile llama.cpp I encountered several warnings while compiling the “llama.cpp/ggml-quants.c” file and which are causing a “cc1: some warnings being treated as errors” issue causing the compile to fail.

Environment and Context

Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.

Physical (or virtual) hardware you are using, e.g. for Linux:

$ lscpu

Architecture:        aarch64
Byte Order:          Little Endian
CPU(s):              4
On-line CPU(s) list: 0-3
Thread(s) per core:  1
Core(s) per socket:  4
Socket(s):           1
Vendor ID:           ARM
Model:               1
Model name:          Cortex-A57
Stepping:            r1p1
CPU max MHz:         1479.0000
CPU min MHz:         102.0000
BogoMIPS:            38.40
L1d cache:           32K
L1i cache:           48K
L2 cache:            2048K
Flags:               fp asimd evtstrm aes pmull sha1 sha2 crc32

Operating System, e.g. for Linux:

$ uname -a

Linux dev 4.9.337-tegra #1 SMP PREEMPT Thu Jun 8 21:19:14 PDT 2023 aarch64 aarch64 aarch64 GNU/Linux

SDK version, e.g. for Linux:

$ python3 --version Python 3.7.9

$ cmake --version cmake version 3.28.20231031-g9c106e3

$ g++ --version g++ (Ubuntu/Linaro 7.5.0-3ubuntu1~18.04) 7.5.0

Failure Information (for bugs)

Please help provide information about the failure / bug.

Steps to Reproduce

Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.

$ git clone https://github.com/ggerganov/llama.cpp
$ cd llama.cpp
$ mkdir build
$ cd build
$ cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_MPI=ON
$ cmake --build . --config Release

Failure Logs

rbyer@dev:~$ git clone https://github.com/ggerganov/llama.cpp
Cloning into 'llama.cpp'...
remote: Enumerating objects: 11791, done.
remote: Counting objects: 100% (3309/3309), done.
remote: Compressing objects: 100% (356/356), done.
remote: Total 11791 (delta 3093), reused 3084 (delta 2953), pack-reused 8482
Receiving objects: 100% (11791/11791), 13.73 MiB | 11.99 MiB/s, done.
Resolving deltas: 100% (8204/8204), done.
rbyer@dev:~$ cd llama.cpp
rbyer@dev:~/llama.cpp$ mkdir build
rbyer@dev:~/llama.cpp$ cd build
rbyer@dev:~/llama.cpp/build$ cmake .. -DLLAMA_CUBLAS=ON -DLLAMA_MPI=ON
-- The C compiler identification is GNU 7.5.0
-- The CXX compiler identification is GNU 7.5.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: /usr/bin/git (found version "2.17.1")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Check if compiler accepts -pthread
-- Check if compiler accepts -pthread - yes
-- Found Threads: TRUE
-- Found CUDAToolkit: /usr/local/cuda-10.2/targets/aarch64-linux/include (found version "10.2.300")
-- cuBLAS found
-- The CUDA compiler identification is NVIDIA 10.2.300
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-10.2/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Using CUDA architectures: 52;61;70
-- Found MPI_C: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi.so (found version "3.1")
-- Found MPI_CXX: /usr/lib/aarch64-linux-gnu/openmpi/lib/libmpi_cxx.so (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- MPI found
-- CMAKE_SYSTEM_PROCESSOR: aarch64
-- ARM detected
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E
-- Performing Test COMPILER_SUPPORTS_FP16_FORMAT_I3E - Failed
-- Configuring done (9.6s)
-- Generating done (0.4s)
-- Build files have been written to: /home/rbyer/llama.cpp/build
rbyer@dev:~/llama.cpp/build$ cmake --build . --config Release
[  1%] Built target BUILD_INFO
[  2%] Building C object CMakeFiles/ggml.dir/ggml.c.o
[  3%] Building C object CMakeFiles/ggml.dir/ggml-alloc.c.o
[  4%] Building C object CMakeFiles/ggml.dir/ggml-backend.c.o
[  5%] Building C object CMakeFiles/ggml.dir/ggml-quants.c.o
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q2_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:3577:36: error: implicit declaration of function ‘vld1q_s16_x2’; did you mean ‘vld1q_s16’? [-Werror=implicit-function-declaration]
         const int16x8x2_t q8sums = vld1q_s16_x2(y[i].bsums);
                                    ^~~~~~~~~~~~
                                    vld1q_s16
/home/rbyer/llama.cpp/ggml-quants.c:3577:36: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:3578:36: warning: missing braces around initializer [-Wmissing-braces]
         const int16x8x2_t mins16 = {vreinterpretq_s16_u16(vmovl_u8(vget_low_u8(mins))), vreinterpretq_s16_u16(vmovl_u8(vget_high_u8(mins)))};
                                    ^
                                     {                                                                                                      }
/home/rbyer/llama.cpp/ggml-quants.c:3614:41: error: implicit declaration of function ‘vld1q_u8_x2’; did you mean ‘vld1q_u32’? [-Werror=implicit-function-declaration]
             const uint8x16x2_t q2bits = vld1q_u8_x2(q2); q2 += 32;
                                         ^~~~~~~~~~~
                                         vld1q_u32
/home/rbyer/llama.cpp/ggml-quants.c:3614:41: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:3616:35: error: implicit declaration of function ‘vld1q_s8_x2’; did you mean ‘vld1q_s32’? [-Werror=implicit-function-declaration]
             int8x16x2_t q8bytes = vld1q_s8_x2(q8); q8 += 32;
                                   ^~~~~~~~~~~
                                   vld1q_s32
/home/rbyer/llama.cpp/ggml-quants.c:3616:35: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:3606:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rbyer/llama.cpp/ggml-quants.c:3621:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(2, 2);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:3606:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rbyer/llama.cpp/ggml-quants.c:3623:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(4, 4);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:3606:17: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
         q8bytes = vld1q_s8_x2(q8); q8 += 32;\
                 ^
/home/rbyer/llama.cpp/ggml-quants.c:3625:13: note: in expansion of macro ‘SHIFT_MULTIPLY_ACCUM_WITH_SCALE’
             SHIFT_MULTIPLY_ACCUM_WITH_SCALE(6, 6);
             ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q3_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:4251:31: error: invalid initializer
         uint8x16x2_t qhbits = vld1q_u8_x2(qh);
                               ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:4269:41: error: invalid initializer
             const uint8x16x2_t q3bits = vld1q_u8_x2(q3); q3 += 32;
                                         ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:4270:43: error: implicit declaration of function ‘vld1q_s8_x4’; did you mean ‘vld1q_s64’? [-Werror=implicit-function-declaration]
             const int8x16x4_t q8bytes_1 = vld1q_s8_x4(q8); q8 += 64;
                                           ^~~~~~~~~~~
                                           vld1q_s64
/home/rbyer/llama.cpp/ggml-quants.c:4270:43: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:4271:43: error: invalid initializer
             const int8x16x4_t q8bytes_2 = vld1q_s8_x4(q8); q8 += 64;
                                           ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q4_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:5171:41: error: invalid initializer
             const uint8x16x2_t q4bits = vld1q_u8_x2(q4); q4 += 32;
                                         ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:5189:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rbyer/llama.cpp/ggml-quants.c:5198:21: error: incompatible types when assigning to type ‘int8x16x2_t {aka struct int8x16x2_t}’ from type ‘int’
             q8bytes = vld1q_s8_x2(q8); q8 += 32;
                     ^
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q5_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:5816:31: error: invalid initializer
         uint8x16x2_t qhbits = vld1q_u8_x2(qh);
                               ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:5824:41: error: invalid initializer
             const uint8x16x2_t q5bits = vld1q_u8_x2(q5); q5 += 32;
                                         ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:5825:41: error: invalid initializer
             const int8x16x4_t q8bytes = vld1q_s8_x4(q8); q8 += 64;
                                         ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c: In function ‘ggml_vec_dot_q6_K_q8_K’:
/home/rbyer/llama.cpp/ggml-quants.c:6525:36: error: invalid initializer
         const int16x8x2_t q8sums = vld1q_s16_x2(y[i].bsums);
                                    ^~~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:6527:38: warning: missing braces around initializer [-Wmissing-braces]
         const int16x8x2_t q6scales = {vmovl_s8(vget_low_s8(scales)), vmovl_s8(vget_high_s8(scales))};
                                      ^
                                       {                                                            }
/home/rbyer/llama.cpp/ggml-quants.c:6539:35: error: invalid initializer
             uint8x16x2_t qhbits = vld1q_u8_x2(qh); qh += 32;
                                   ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:6540:35: error: implicit declaration of function ‘vld1q_u8_x4’; did you mean ‘vld1q_u64’? [-Werror=implicit-function-declaration]
             uint8x16x4_t q6bits = vld1q_u8_x4(q6); q6 += 64;
                                   ^~~~~~~~~~~
                                   vld1q_u64
/home/rbyer/llama.cpp/ggml-quants.c:6540:35: error: invalid initializer
/home/rbyer/llama.cpp/ggml-quants.c:6541:35: error: invalid initializer
             int8x16x4_t q8bytes = vld1q_s8_x4(q8); q8 += 64;
                                   ^~~~~~~~~~~
/home/rbyer/llama.cpp/ggml-quants.c:6584:21: error: incompatible types when assigning to type ‘int8x16x4_t {aka struct int8x16x4_t}’ from type ‘int’
             q8bytes = vld1q_s8_x4(q8); q8 += 64;
                     ^
cc1: some warnings being treated as errors
CMakeFiles/ggml.dir/build.make:117: recipe for target 'CMakeFiles/ggml.dir/ggml-quants.c.o' failed
make[2]: *** [CMakeFiles/ggml.dir/ggml-quants.c.o] Error 1
CMakeFiles/Makefile2:647: recipe for target 'CMakeFiles/ggml.dir/all' failed
make[1]: *** [CMakeFiles/ggml.dir/all] Error 2
Makefile:145: recipe for target 'all' failed
make: *** [all] Error 2
rbyer@dev:~/llama.cpp/build$

About this issue

Original URL
State: closed
Created 8 months ago
Comments: 20 (3 by maintainers)

Most upvoted comments

The hmax issue will be fixed with https://github.com/ggerganov/llama.cpp/pull/4862

ggerganov on Jan 11, 2024

After a couple of tweaks, I managed to make this work. Be sure to:

compile gcc 8.5 from the source code https://ftp.gnu.org/gnu/gcc/gcc-8.5.0/
after make install, make sure that your gcc symbolic link ate /usr/bin points to the right file
tried using make but it didnt work. use cmake with the following options cmake … -DLLAMA_CUBLAS=1 -DCMAKE_CUDA_COMPILER=/usr/local/cuda/bin/nvcc
had to tweak this file to include the -fPIC compiler option - CMakeFiles/ggml.dir/flags.make, otherwise ld returns an error

paulohm2 on Jan 7, 2024

compiling gcc 8.5 from source right now. will let everybody know if it works in a couple of hours.

paulohm2 on Jan 6, 2024

I have found this (https://github.com/ggerganov/llama.cpp/issues/4123) which suggests to install gcc 8.5 from source, haven’t finished trying yet. 🤷🏻‍♂️🤷🏻‍♂️

lndshrk504 on Jan 6, 2024