gpt4all: Illegal intruction after running `gpt4all-lora-quantized-linux-x86`

I’m trying to run the gpt4all-lora-quantized-linux-x86 on a Ubuntu Linux machine with 240 Intel® Xeon® CPU E7-8880 v2 @ 2.50GHz processors and 295GB RAM. No GPUs installed. Ubuntu 22.04 running on a VMWare ESXi

I get the following error: Illegal instruction

willem@ubuntu:/data/chat$ gdb -q ./gpt4all-lora-quantized-linux-x86
Reading symbols from ./gpt4all-lora-quantized-linux-x86...
(No debugging symbols found in ./gpt4all-lora-quantized-linux-x86)
(gdb) run
Starting program: /data/gpt4all/chat/gpt4all-lora-quantized-linux-x86
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
main: seed = 1680171804
llama_model_load: loading model from 'gpt4all-lora-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from 'gpt4all-lora-quantized.bin'
llama_model_load: .................................... done
llama_model_load: model size =  4017.27 MB / num tensors = 291

system_info: n_threads = 4 / 240 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
sampling parameters: temp = 0.100000, top_k = 40, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.300000



Program received signal SIGILL, Illegal instruction.
0x0000000000425282 in ggml_set_f32 ()

About this issue

Original URL
State: closed
Created a year ago
Reactions: 3
Comments: 29

Most upvoted comments

I have an Intel i5-3320M with no AVX2 or FMA support. I followed these steps:

https://github.com/nomic-ai/gpt4all/issues/82#issuecomment-1492001832

@mvrozanti @pirate486743186 Had no problems compiling the executable with the following standard commands:
wget https://github.com/zanussbaum/gpt4all.cpp/archive/refs/heads/master.zip
unzip master.zip
cd gpt4all.cpp-master
mkdir build; cd build

and then

$ cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_FMA=1 … $ make $ ./chat -m ~/gpt4all/chat/gpt4all-lora-quantized.bin

and it worked. On my laptop, it is very slow as would be expected.

martinmcmillan on Apr 2, 2023

for those that are frustrated, keep in mind it was released 2 days ago. Have ultra low expectations.

pirate486743186 on Mar 31, 2023

with the static compile, it gives again ‘illegal instruction’. I have an old laptop.

with cmake it can’t find pthread when compiling apparently.

/usr/bin/ld: libggml.a(ggml.c.o): in function `ggml_graph_compute':
ggml.c:(.text+0x16eb0): undefined reference to `pthread_create'
/usr/bin/ld: ggml.c:(.text+0x16f13): undefined reference to `pthread_join'
collect2: error: ld returned 1 exit status
make[3]: *** [CMakeFiles/chat.dir/build.make:119: chat] Error 1
make[2]: *** [CMakeFiles/Makefile2:153: CMakeFiles/chat.dir/all] Error 2
make[1]: *** [CMakeFiles/Makefile2:160: CMakeFiles/chat.dir/rule] Error 2
make: *** [Makefile:163: chat] Error 2

Add this to CMakeLists.txt after line 25: set(CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -pthread") set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -pthread")

JezausTevas on Apr 6, 2023

@pirate486743186 and what is the address of that instruction?

@mvrozanti you can try this one: no-avx-avx2-fma-f16c.tar.gz Compiled it with cmake -D LLAMA_NO_AVX2=1 -D LLAMA_NO_AVX=1 -D LLAMA_NO_FMA=1 .. and additionally commented out the line with F16C.

@vsl-iil you can try the archive above too, though I cannot determine what is the instruction on your address 0x000055555558a1c9, there’s some kind of heavy ASLR in your case.

qinidema on Apr 3, 2023

@qinidema That fork also didn’t work for me:

main: seed = 1680211596
llama_model_load: loading model from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin' - please wait ...
llama_model_load: ggml ctx size = 6065.35 MB
llama_model_load: memory_size =  2048.00 MB, n_mem = 65536
llama_model_load: loading model part 1/1 from '/home/m/macrovip/gpt4all-lora-unfiltered-quantized.bin'
llama_model_load: terminate called after throwing an instance of 'std::__ios_failure'
  what():  basic_ios::clear: iostream error

At least I got an error with this one. What made you believe that fork in specific would work?

mvrozanti on Mar 30, 2023

It doesn’t say. I have avx and f16c. These should probably work for most or even all. It worked with no-avx-avx2-fma-f16c.tar.gz (you purged everything lol).

pirate486743186 on Apr 3, 2023

There are 665 instructions in that function, and there are ones that require AVX and AVX2. The instruction at 0x0000000000425282 is “vbroadcastss ymm1,xmm0” (C4 E2 7D 18 C8), and it requires AVX2. It lies just in the beginning of the function ggml_set_f32, and the only previous AVX instruction is vmovss, which requires just AVX. So likely vbroadcastss was just the first AVX2-requiring instruction that your CPU encountered.

So you need a CPU with AVX2 support to run this, as far as I can see E7-8880 v2 supports only AVX, but not AVX2. But in the output you’ve provided I see

AVX = 1 | AVX2 = 1

which confuses me a bit.

qinidema on Mar 30, 2023