go-llama.cpp: how to resolve this error? trying to run cublas

@MathiasGS, can u help with this pls?

root@ubuntu:/usr/local/src/go-llama.cpp# CGO_LDFLAGS="-lcublas -lcudart -L/usr/local/cuda/lib64/" LIBRARY_PATH=$PWD C_INCLUDE_PATH=$PWD go run ./examples -m "../llama.cpp/models/speechless-llama2-13b.Q4_K_M.gguf" -t 14
# github.com/go-skynet/go-llama.cpp
binding.cpp: In function ‘int llama_predict(void*, void*, char*, bool)’:
binding.cpp:332:53: warning: format ‘%zu’ expects argument of type ‘size_t’, but argument 2 has type ‘int’ [-Wformat=]
  332 |                 printf("<<input too long: skipped %zu token%s>>", skipped_tokens, skipped_tokens != 1 ? "s" : "");
      |                                                   ~~^             ~~~~~~~~~~~~~~
      |                                                     |             |
      |                                                     |             int
      |                                                     long unsigned int
      |                                                   %u
binding.cpp: In function ‘void llama_binding_free_model(void*)’:
binding.cpp:797:5: warning: possible problem detected in invocation of ‘operator delete’ [-Wdelete-incomplete]
  797 |     delete ctx->model;
      |     ^~~~~~~~~~~~~~~~~
binding.cpp:797:17: warning: invalid use of incomplete type ‘struct llama_model’
  797 |     delete ctx->model;
      |            ~~~~~^~~~~
In file included from ./llama.cpp/common/common.h:5,
                 from binding.cpp:1:
./llama.cpp/llama.h:60:12: note: forward declaration of ‘struct llama_model’
   60 |     struct llama_model;
      |            ^~~~~~~~~~~
binding.cpp:797:5: note: neither the destructor nor the class-specific ‘operator delete’ will be called, even if they are declared when the class is defined
  797 |     delete ctx->model;
      |     ^~~~~~~~~~~~~~~~~
create_gpt_params: loading model ../llama.cpp/models/speechless-llama2-13b.Q4_K_M.gguf
SIGSEGV: segmentation violation
PC=0x7f4c8d74bfbd m=0 sigcode=1
signal arrived during cgo execution

goroutine 1 [syscall]:
runtime.cgocall(0x49f6e0, 0xc00005ca90)
	/usr/local/go/src/runtime/cgocall.go:157 +0x4b fp=0xc00005ca68 sp=0xc00005ca30 pc=0x41522b
github.com/go-skynet/go-llama%2ecpp._Cfunc_load_model(0x2a1e100, 0x80, 0x0, 0x1, 0x0, 0x1, 0x1, 0x0, 0x0, 0x200, ...)
	_cgo_gotypes.go:267 +0x4f fp=0xc00005ca90 sp=0xc00005ca68 pc=0x49c04f
github.com/go-skynet/go-llama%2ecpp.New({0x7ffccaeca64a, 0x35}, {0xc00005ce20, 0x4, 0x1?})
	/usr/local/src/go-llama.cpp/llama.go:39 +0x385 fp=0xc00005cca0 sp=0xc00005ca90 pc=0x49c7a5
main.main()
	/usr/local/src/go-llama.cpp/examples/main.go:37 +0x3bd fp=0xc00005cf40 sp=0xc00005cca0 pc=0x49e93d
runtime.main()
	/usr/local/go/src/runtime/proc.go:267 +0x2bb fp=0xc00005cfe0 sp=0xc00005cf40 pc=0x445c9b
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00005cfe8 sp=0xc00005cfe0 pc=0x46fd21

goroutine 2 [force gc (idle)]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00004cfa8 sp=0xc00004cf88 pc=0x4460ee
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.forcegchelper()
	/usr/local/go/src/runtime/proc.go:322 +0xb3 fp=0xc00004cfe0 sp=0xc00004cfa8 pc=0x445f73
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00004cfe8 sp=0xc00004cfe0 pc=0x46fd21
created by runtime.init.6 in goroutine 1
	/usr/local/go/src/runtime/proc.go:310 +0x1a

goroutine 3 [GC sweep wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00004d778 sp=0xc00004d758 pc=0x4460ee
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.bgsweep(0x0?)
	/usr/local/go/src/runtime/mgcsweep.go:280 +0x94 fp=0xc00004d7c8 sp=0xc00004d778 pc=0x432a14
runtime.gcenable.func1()
	/usr/local/go/src/runtime/mgc.go:200 +0x25 fp=0xc00004d7e0 sp=0xc00004d7c8 pc=0x427da5
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00004d7e8 sp=0xc00004d7e0 pc=0x46fd21
created by runtime.gcenable in goroutine 1
	/usr/local/go/src/runtime/mgc.go:200 +0x66

goroutine 4 [GC scavenge wait]:
runtime.gopark(0xc000076000?, 0x59f718?, 0x1?, 0x0?, 0xc0000071e0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc00004df70 sp=0xc00004df50 pc=0x4460ee
runtime.goparkunlock(...)
	/usr/local/go/src/runtime/proc.go:404
runtime.(*scavengerState).park(0xa26d60)
	/usr/local/go/src/runtime/mgcscavenge.go:425 +0x49 fp=0xc00004dfa0 sp=0xc00004df70 pc=0x4302a9
runtime.bgscavenge(0x0?)
	/usr/local/go/src/runtime/mgcscavenge.go:653 +0x3c fp=0xc00004dfc8 sp=0xc00004dfa0 pc=0x43083c
runtime.gcenable.func2()
	/usr/local/go/src/runtime/mgc.go:201 +0x25 fp=0xc00004dfe0 sp=0xc00004dfc8 pc=0x427d45
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc00004dfe8 sp=0xc00004dfe0 pc=0x46fd21
created by runtime.gcenable in goroutine 1
	/usr/local/go/src/runtime/mgc.go:201 +0xa5

goroutine 18 [finalizer wait]:
runtime.gopark(0x0?, 0x0?, 0x0?, 0x0?, 0x0?)
	/usr/local/go/src/runtime/proc.go:398 +0xce fp=0xc000048628 sp=0xc000048608 pc=0x4460ee
runtime.runfinq()
	/usr/local/go/src/runtime/mfinal.go:193 +0x107 fp=0xc0000487e0 sp=0xc000048628 pc=0x426e27
runtime.goexit()
	/usr/local/go/src/runtime/asm_amd64.s:1650 +0x1 fp=0xc0000487e8 sp=0xc0000487e0 pc=0x46fd21
created by runtime.createfing in goroutine 1
	/usr/local/go/src/runtime/mfinal.go:163 +0x3d

rax    0x0
rbx    0x2a1e9a8
rcx    0x7f4c83419c80
rdx    0x0
rdi    0x2a1e9a8
rsi    0x7ffccaec8ae0
rbp    0x7ffccaec8ae0
rsp    0x7ffccaec87c0
r8     0x57
r9     0x2a1ebe0
r10    0x7f4c8d60d258
r11    0x7f4c83419ce0
r12    0x0
r13    0x0
r14    0x2a1e9b8
r15    0x7ffccaec8ac0
rip    0x7f4c8d74bfbd
rflags 0x10246
cs     0x33
fs     0x0
gs     0x0
exit status 2
root@ubuntu:/usr/local/src/go-llama.cpp# 

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 18 (7 by maintainers)

Commits related to this issue

Most upvoted comments

I have already had such issues in the past - that’s the whole point of having the patch (I would have avoided at all, if possible). I’ve opened up a PR upstream trying to fix this in the correct way, but it was rejected due to code style. https://github.com/ggerganov/llama.cpp/pull/1902. The copy-to-value all over the code seem to trigger misalignment of structures on different combinations of toolchains, triggering this.

It looks a combination of nvcc version + gcc + go to trigger this - I used valgrind as well to debug this in the past to carefully trying to see the culprit, but there is nothing actually that seems to indicate what’s behind the real issue code-wise, so we are back at hacks all the way long.

I’ll try to reproduce this on a GPU, however it really needs time and patience to play with valgrind and alikes.