llama.cpp: Error: inlining failed in call to ‘always_inline’ ‘_mm256_cvtph_ps’ on x86_64 - better support for different x86_64 CPU instruction extensions
When I compile with make, the following error occurs
inlining failed in call to ‘always_inline’ ‘_mm256_cvtph_ps’: target specific option mismatch
52 | _mm256_cvtph_ps (__m128i __A)
Error will be reported when executing cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mavx -msse3 -c ggml.c -o ggml.o
.
But the error of executing cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -msse3 -c ggml.c -o ggml.o
will not occur.
Must -mavx
be used with -mf16c
?
OS: Arch Linux x86_64 Kernel: 6.1.18-1-lts
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 35
I made a patch and can make normally
It would be great if @xiliuya and @polkovnikov could work together to both create a pull request with your patches so we can support a wider range of CPUs.
_mm256_cvtph_ps
requires the fp16c extension(?) see hereYou need to add
-mf16c
to the build commandNo, when I execute
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -msse3 -c ggml.c -o ggml.o
to generate. o files, runmake
can run normally.@gjmulder @xiliuya
I have this issue reported issue on my CPU. Apparently it has AVX, but no F16C (and no AVX2). I have quite old 10-15 year old Intel CPU on laptop.
Probably it is the case that some old CPUs have AVX while having no F16C.
I had this compilation issue on Windows latest 16-th Clang when provided
-march=native
. As you know arch native tells compiler to use all CPU features of current CPU, and it appears that it provides AVX feature but without F16C feature.My compilation was fixed and program was working (although not to very fast) after I implemented this conversion functions myself and placed following code inside
#elif defined(__AVX__)
section ofggml.c
:If some C/C++ gurus know faster implementation of this function for AVX then please tell here.
For know suggesting to put fix above into main branch by any volunteer. If code above is alright.
My CPU does not support avx2, but it can run normally through the above method.
Yes, not in a virtual environment such as docker. I will also report an error when I execute
cc -I. -O3 -DNDEBUG -std=c11 -fPIC -pthread -mavx -msse3 -c ggml.c -o ggml.o
on other machines.This patch allowed me to successfully run the make command.
@RiccaDS you can try merging #617, that should significant boost AVX1 performance.
Good work guys. I am not a C++ programmer…
I am however interested in performance. I’d ideally want the most performant CPU code for any arch.
If it is Arch I’m guessing you’re using a very recent
g++
version. I know it compiles withg++10
under Debian and Ubuntu. We haven’t collected data on otherg++
versions.