llama.cpp: [User] Interactive mode immediately exits on Windows with Zig
Prerequisites
Please answer the following questions for yourself before submitting an issue.
- I am running the latest code. Development is very rapid so there are no tagged versions as of now.
- I carefully followed the README.md.
- I searched using keywords relevant to my issue to make sure that I am creating a new issue that is not already open (or closed).
- I reviewed the Discussions, and have a new bug or useful enhancement to share.
Expected Behavior
When building with Zig, running the example command for Alpaca results in an interactive prompt that I can type in.
Current Behavior
When building with Zig, running the example command for Alpaca results in an interactive prompt that exits immediately, without producing an error message. Non-interactive mode works fine.
When using Command Prompt specifically, the process exits and leaves the console text green - that doesn’t happen in PowerShell or Git Bash, which reset the console color. I think that means that it isn’t reaching this line.
The commands, for reference:
zig build -Drelease-fast
.\zig-out\bin\main.exe -m .\models\ggml-alpaca-7b-q4.bin --color -f .\prompts\alpaca.txt --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
I also tried using LLaMA in interactive mode, which resulted in the same behavior.
.\zig-out\bin\main.exe -m D:\llama\LLaMA\7B\ggml-model-q4_0.bin -ins
Building with MSVC via CMake produces a binary that works perfectly fine (besides also leaving the Command Prompt console text green when exiting with Ctrl+C).
mkdir build
cd build
cmake ..
cmake --build . --config Release
cd ..
.\build\bin\Release\main.exe -m .\models\ggml-alpaca-7b-q4.bin -f .\prompts\alpaca.txt --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
Environment and Context
Please provide detailed information about your computer setup. This is important in case the issue is not reproducible except for under certain specific conditions.
- Physical (or virtual) hardware you are using:
Device name DESKTOP-HP640DM Processor 11th Gen Intel® Core™ i9-11900K @ 3.50GHz 3.50 GHz Installed RAM 24.0 GB (23.8 GB usable) System type 64-bit operating system, x64-based processor Pen and touch Pen support
- Operating System:
Edition Windows 10 Home Version 22H2 Installed on 3/17/2021 OS build 19045.2728 Experience Windows Feature Experience Pack 120.2212.4190.0
- SDK version:
> pyenv exec python3 --version
Python 3.10.9
> cmake --version
cmake version 3.20.0-rc5
> cmake ..
-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.22000.0 to target Windows 10.0.19045.
-- The C compiler identification is MSVC 19.29.30148.0
-- The CXX compiler identification is MSVC 19.29.30148.0
<snip>
> zig version
0.10.1
> git log | head -1
commit 74f5899df4a6083fc467b620baa1cf821e37799d
- Model checksums:
> md5sum .\models\ggml-alpaca-7b-q4.bin
\7a81638857b7e03f7e3482f3e68d78bc *.\\models\\ggml-alpaca-7b-q4.bin
> md5sum D:\llama\LLaMA\7B\ggml-model-q4_0.bin
\b96f7e3c1cd6dcc6ffd9aaf975b776e5 *D:\\llama\\LLaMA\\7B\\ggml-model-q4_0.bin
Failure Information (for bugs)
Please help provide information about the failure if this is a bug. If it is not a bug, please remove the rest of this template.
Steps to Reproduce
Please provide detailed steps for reproducing the issue. We are not sitting in front of your screen, so the more detail the better.
- Clone the repo (https://github.com/ggerganov/llama.cpp/commit/74f5899df4a6083fc467b620baa1cf821e37799d)
- Run
zig build -Drelease-fast - Run the example command (adjusted slightly for the env):
.\zig-out\bin\main.exe -m .\models\ggml-alpaca-7b-q4.bin --color -f .\prompts\alpaca.txt --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7 - Observe that the process exits immediately after reading the prompt
Failure Logs
Running the Zig build:
> .\zig-out\bin\main.exe -m .\models\ggml-alpaca-7b-q4.bin --color -f .\prompts\alpaca.txt --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
main: seed = 1681616581
llama.cpp: loading model from .\models\ggml-alpaca-7b-q4.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size = 1024.00 MB
system_info: n_threads = 7 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 1 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:
'
sampling: temp = 0.200000, top_k = 10000, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.000000
generate: n_ctx = 2048, n_batch = 256, n_predict = -1, n_keep = 23
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to LLaMa.
- If you want to submit another line, end your input in '\'.
Below is an instruction that describes a task. Write a response that appropriately completes the request.
>
> # (process exited automatically)
Running the MSVC/CMake build:
> .\build\bin\Release\main.exe -m .\models\ggml-alpaca-7b-q4.bin -f .\prompts\alpaca.txt --color --ctx_size 2048 -n -1 -ins -b 256 --top_k 10000 --temp 0.2 --repeat_penalty 1 -t 7
main: seed = 1681616683
llama.cpp: loading model from .\models\ggml-alpaca-7b-q4.bin
llama.cpp: can't use mmap because tensors are not aligned; convert to new format to avoid this
llama_model_load_internal: format = 'ggml' (old version with low tokenizer quality and no mmap support)
llama_model_load_internal: n_vocab = 32000
llama_model_load_internal: n_ctx = 2048
llama_model_load_internal: n_embd = 4096
llama_model_load_internal: n_mult = 256
llama_model_load_internal: n_head = 32
llama_model_load_internal: n_layer = 32
llama_model_load_internal: n_rot = 128
llama_model_load_internal: ftype = 2 (mostly Q4_0)
llama_model_load_internal: n_ff = 11008
llama_model_load_internal: n_parts = 1
llama_model_load_internal: model size = 7B
llama_model_load_internal: ggml ctx size = 4113739.11 KB
llama_model_load_internal: mem required = 5809.32 MB (+ 1026.00 MB per state)
...................................................................................................
.
llama_init_from_file: kv self size = 1024.00 MB
system_info: n_threads = 7 / 16 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | FMA = 1 | NEON = 0 | ARM_FMA = 0 | F16C = 1 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 1 | VSX = 0 |
main: interactive mode on.
Reverse prompt: '### Instruction:
'
sampling: temp = 0.200000, top_k = 10000, top_p = 0.950000, repeat_last_n = 64, repeat_penalty = 1.000000
generate: n_ctx = 2048, n_batch = 256, n_predict = -1, n_keep = 23
== Running in interactive mode. ==
- Press Ctrl+C to interject at any time.
- Press Return to return control to LLaMa.
- If you want to submit another line, end your input in '\'.
Below is an instruction that describes a task. Write a response that appropriately completes the request.
> Tell me something I don't know.
The longest river in the world is the Nile River, which stretches 6,650 km (4,130 miles) across the continent of Africa.
>
> # (Ctrl+C)
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 16 (6 by maintainers)
Whoa, I think I found the root cause. It’s this issue. The problem is in
ReadFile()when applied to console handles, so it’s not something that can be fixed either in the C or the C++ library (they still have to callReadFile()at some point).As I understand the explanation: the console host represents the text in UTF-16 internally and has to convert it to an OEM encoding (UTF-8 in our case) before returning it to the
ReadFile()caller. However, it assumes that each UTF-16 character corresponds to one OEM character, so of course this breaks for pretty much anything outside ASCII.I think this has been fixed only in February this year. So unfortunately we can’t use UTF-8 input here.
@karashiiro
I think
--libcchanges be the C library (e.g.,ucrt, which containsgetc()andgetwc()), not the C++ one (e.g.,libc++, which containsstd::getline())? We are having problems with the latter, not the former.Judging from this issue, I’d say that building C++ binaries for the MSVC ABI is generally not supported yet (although the build errors there are very different from your linking error).
This should theoretically be possible, but I don’t believe replacing
std::getline()just for the sake of making the Zig build work is worth it. Might be a good idea for unrelated reasons (see below).@anzz1
This
setmode(...)usage does strike me as weird, but apparently it’s what people do in the wild. More importantly, there’s an LLVM review request (updated today no less), which I think should fix thestd::getline()implementation inlibc++aftersetmode(..., _O_WTEXT )(I haven’t tested it, but it does replacegetc()withgetwc()when appropriate).Although I absolutely agree that introducing additional UTF-16->UTF-8 translation in #840 makes no sense, even though it fixes the immediate problem with the input. It does look like
std::getline()might be broken, since I can see in the debugger that reverting #840 results instd::getline()returning 3 zero bytes for a string with 3 non-ASCII-symbols. I will try to find out why this happens (thankfully, the implementation ofstd::getline()is open source).Perhaps a possible solution. Zig to msvc (x86, x86_64, aarch64) targets. https://github.com/kassane/xwin-zig-test
Yep, I just built and ran Windows Terminal from the commit fixing the issue and the previous one. As expected, the former allows me to properly enter non-ASCII symbols with #840 reverted, the latter doesn’t.
Of course, we can’t rely on everyone having a recent version of Windows Terminal, so I guess that
_setmode(_fileno(stdin), _O_WTEXT)has to stay, even though it might interfere with the mode set in_initterm(). Or is there a better solution that might work?@karashiiro
I took a stab at debugging this, and I think I know the reason why it exits. The immediate problem is that
getc()printsInvalid parameter passed to C runtime function.(you can only see this in a debugger, since it is logged viaOutputDebugStringA) and returns an error when called indirectly from std::getline(). The culprit is apparently this line, which forces the standard input to use UTF-16. If you comment it out, themainstarts working again, even though you lose the ability to enter non-ASCII text.More precisely, I think this is what happens:
_setmode(..., _O_WTEXT)you can no longer usegetc()on the file stream and have to usegetwc()insteadzigbundles a copy oflibc++on Windows by defaultstd::getline()implementation fromlibc++eventually calls__stdinbuf<_CharT>::__getchar()to read a symbol from the standard input__stdinbuf<_CharT>::__getcharcalls getc() unconditionally and failsstd::getline()also failingA small self-contained example to illustrate the problem would be this:
If you compile this program with
zig c++and run it, it will printERRORand exit immediately. If you comment out the line with_setmode(...), it will read a line of text from stdin, as expected.I have absolutely no idea how to fix it, though.
_setmode(..., _O_WTEXT)is there for a very good reason and shouldn’t be removed. As far as I can see, there is no way to makestd::getline()fromlibc++to usegetwc()instead ofgetc(). And I don’t see a way to get rid of bundledlibc++when building with Zig, either.That’s weird. I would recommend not using Zig to build this on Windows then.