ctransformers: Segmentation fault on m1 mac

Trying simple example on m1 mac:

from ctransformers import AutoModelForCausalLM

llm = AutoModelForCausalLM.from_pretrained(
    "/path/to/starcoderbase-GGML/starcoderbase-ggml-q4_0.bin",
    model_type="starcoder",
    lib="basic",
)

print(llm("Hi"))

leads to segmentation fault. Model works fine with ggml example code.

About this issue

Original URL
State: closed
Created a year ago
Comments: 65 (24 by maintainers)

Commits related to this issue

Fix `threads` parameter See #8 — committed to marella/ctransformers by marella a year ago

Most upvoted comments

Thanks a lot @s-kostyaev for helping in debugging the issue.

marella on May 26, 2023

Finally it works. Threads parameter works. It even works with conda now. Thank you!

s-kostyaev on May 26, 2023

@marella sorry I’ve been working like crazy, I see @s-kostyaev executed the necessary commands, if you need anything else from my hardware just let me know, glad you guys found it.

No worries @bgonzalezfractal

@s-kostyaev I released a fix in the latest version 0.2.1 Please update:

pip install --upgrade ctransformers

and let me know if it works. Please don’t set lib=... option.

Also please try running with different threads (1, 4, 8) and let me know if you see any change in performance.

marella on May 25, 2023

Thanks. I think I found the issue. I will make a new release and will let you know in sometime.

marella on May 25, 2023

Maybe this - https://stackoverflow.com/questions/54587052/cmake-on-mac-could-not-find-threads-missing-threads-found

I also saw this but cmake should fail with an error but it is successfully building. May be it found threads but simply not printing it. When you build ggml repo, are you seeing a line which says Found Threads: TRUE?

No.

Thanks for checking. I think cmake is just not printing that it found threads library, otherwise it wouldn’t work all.

marella on May 23, 2023

Thanks. Tomorrow I will add a main.cc file to repo which can be run directly without Python. It should make it easy to debug the issue.

marella on May 22, 2023

16 minutes starchat-alpha-q4_0 100% cpu no output with max_new_tokens=1 file test.py:

from ctransformers import AutoModelForCausalLM
from ctransformers import AutoConfig

config = AutoConfig.from_pretrained(
    "/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
    threads=8,
)

llm = AutoModelForCausalLM.from_pretrained(
    "/Users/sergeykostyaev/nn/text-generation-webui/models/starchat-alpha-ggml-q4_0.bin",
    model_type="starcoder",
    lib="/Users/sergeykostyaev/nn/ctransformers/build/lib/libctransformers.dylib",
    config=config,
)
print("loaded")
print(llm("Hi", max_new_tokens=1, threads=8))

Printed only “loaded”

45 minutes - nothing changes

s-kostyaev on May 22, 2023