mlc-llm: [Bug] - MLCChat Llama Not Able to Initialize On Pixel 7 Phone
๐ Bug
MLCChat App was not able to initialize at Pixel 7 Android Phone I am using Llama-2-7b-chat-hf model, with q4f32_1 quantization. Compiled and built the app successfully but not sure why it was calling llm_chat.cc from my development machine location โ/home/bajiezi/projects/mlc-llm/cpp/llm_chat.ccโ. It failed to initialize.
Error message:
MLCChat failed
Stack trace:
org.apache.tvm.Base$TVMError: InternalError: Check failed: (fload_exec.defined()) is false: TVM runtime cannot find vm_load_executable
Stack trace:
File "/home/bajiezi/projects/mlc-llm/cpp/llm_chat.cc", line 169
at org.apache.tvm.Base.checkCall(Base.java:173)
at org.apache.tvm.Function.invoke(Function.java:130)
at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:642)
at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:640)
at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:543)
at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:640)
at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
at java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
at java.lang.Thread.run(Thread.java:1012)
Error message:
InternalError: Check failed: (fload_exec.defined()) is false: TVM runtime cannot find vm_load_executable
Stack trace:
File "/home/bajiezi/projects/mlc-llm/cpp/llm_chat.cc", line 169
I basically followed the instructions here: https://llm.mlc.ai/docs/deploy/android.html
Environment
-
Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Android
-
Operating system (e.g. Ubuntu/Windows/MacOS/โฆ): Ubuntu
-
Device (e.g. iPhone 12 Pro, PC+RTX 3090, โฆ): Pixel 7
-
How you installed MLC-LLM (
conda, source):
python -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly -
How you installed TVM-Unity (
pip, source): python -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly -
GPU driver version (if applicable): 535.129.03
-
CUDA/cuDNN version (if applicable): 12.2
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 32 (10 by maintainers)
Iโm getting the same paged kv cache error with tiny llama on android. I built the model lib using the following command:
It seems for some reason the
elsebranch is executed here: https://github.com/mlc-ai/mlc-llm/blob/0e7ee203228334301423867ac54f53a1575ed9da/cpp/llm_chat.cc#L614-L624Youโre welcome. I also tested
llama,gpt_neox,mistral,phi_msft, andgemmawhich all worked on my Pixel 7. Hereโs the relevant script:I can confirm that TinyLlama v0.4 now works when building with the script that I provided in https://github.com/mlc-ai/mlc-llm/issues/1741#issuecomment-1944566496.
@phicolzhang Good day, try clean app cache, restart phone and redownload Llama.