ctransformers: Segmentation fault on m1 mac

Trying simple example on m1 mac:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained(
    "/path/to/starcoderbase-GGML/starcoderbase-ggml-q4_0.bin",
    model_type="starcoder",
    lib="basic",
)

print(llm("Hi"))

leads to segmentation fault. Model works fine with ggml example code.

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 65 (24 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks a lot @s-kostyaev for helping in debugging the issue.

Finally it works. Threads parameter works. It even works with conda now. Thank you!

@marella sorry I’ve been working like crazy, I see @s-kostyaev executed the necessary commands, if you need anything else from my hardware just let me know, glad you guys found it.

No worries @bgonzalezfractal


@s-kostyaev I released a fix in the latest version 0.2.1 Please update:

pip install --upgrade ctransformers

and let me know if it works. Please don’t set lib=... option.

Also please try running with different threads (1, 4, 8) and let me know if you see any change in performance.

Thanks. I think I found the issue. I will make a new release and will let you know in sometime.

Maybe this - https://stackoverflow.com/questions/54587052/cmake-on-mac-could-not-find-threads-missing-threads-found

I also saw this but cmake should fail with an error but it is successfully building. May be it found threads but simply not printing it. When you build ggml repo, are you seeing a line which says Found Threads: TRUE?

No.

Thanks for checking. I think cmake is just not printing that it found threads library, otherwise it wouldn’t work all.

Thanks. Tomorrow I will add a main.cc file to repo which can be run directly without Python. It should make it easy to debug the issue.

16 minutes starchat-alpha-q4_0 100% cpu no output with max_new_tokens=1 file test.py:

from ctransformers import AutoModelForCausalLM
from ctransformers import AutoConfig

config = AutoConfig.from_pretrained(
    "/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
    threads=8,
)

llm = AutoModelForCausalLM.from_pretrained(
    "/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
    model_type="starcoder",
    lib="/Users/sergeykostyaev/nn/ctransformers/build/lib/libctransformers.dylib",
    config=config,
)
print("loaded")
print(llm("Hi", max_new_tokens=1, threads=8))

Printed only “loaded”

45 minutes - nothing changes