mlc-llm: [Bug] - MLCChat Llama Not Able to Initialize On Pixel 7 Phone

🐛 Bug

MLCChat App was not able to initialize at Pixel 7 Android Phone I am using Llama-2-7b-chat-hf model, with q4f32_1 quantization. Compiled and built the app successfully but not sure why it was calling llm_chat.cc from my development machine location “/home/bajiezi/projects/mlc-llm/cpp/llm_chat.cc”. It failed to initialize.

Error message:

MLCChat failed

Stack trace:
org.apache.tvm.Base$TVMError: InternalError: Check failed: (fload_exec.defined()) is false: TVM runtime cannot find vm_load_executable
Stack trace:
  File "/home/bajiezi/projects/mlc-llm/cpp/llm_chat.cc", line 169

	at org.apache.tvm.Base.checkCall(Base.java:173)
	at org.apache.tvm.Function.invoke(Function.java:130)
	at ai.mlc.mlcllm.ChatModule.reload(ChatModule.java:43)
	at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:642)
	at ai.mlc.mlcchat.AppViewModel$ChatState$mainReloadChat$1$2.invoke(AppViewModel.kt:640)
	at ai.mlc.mlcchat.AppViewModel$ChatState.callBackend(AppViewModel.kt:543)
	at ai.mlc.mlcchat.AppViewModel$ChatState.mainReloadChat$lambda$3(AppViewModel.kt:640)
	at ai.mlc.mlcchat.AppViewModel$ChatState.$r8$lambda$JJKpoRMMpp77FzXKA0o00i8lgRA(Unknown Source:0)
	at ai.mlc.mlcchat.AppViewModel$ChatState$$ExternalSyntheticLambda3.run(Unknown Source:8)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:487)
	at java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:644)
	at java.lang.Thread.run(Thread.java:1012)


Error message:
InternalError: Check failed: (fload_exec.defined()) is false: TVM runtime cannot find vm_load_executable
Stack trace:
  File "/home/bajiezi/projects/mlc-llm/cpp/llm_chat.cc", line 169

I basically followed the instructions here: https://llm.mlc.ai/docs/deploy/android.html

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): Android
Operating system (e.g. Ubuntu/Windows/MacOS/…): Ubuntu
Device (e.g. iPhone 12 Pro, PC+RTX 3090, …): Pixel 7
How you installed MLC-LLM (conda, source):
python -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly
How you installed TVM-Unity (pip, source): python -m pip install --pre -U -f https://mlc.ai/wheels mlc-ai-nightly
GPU driver version (if applicable): 535.129.03
CUDA/cuDNN version (if applicable): 12.2

About this issue

Original URL
State: closed
Created 5 months ago
Comments: 32 (10 by maintainers)

Most upvoted comments

I’m getting the same paged kv cache error with tiny llama on android. I built the model lib using the following command:

curl --create-dirs -o dist/TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC/mlc-chat-config.json https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC/raw/main/mlc-chat-config.json
mlc_chat compile ./dist/TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC/mlc-chat-config.json --device android --system-lib-prefix tinyllama_q4f16_1_ -o ./dist/libs/TinyLlama-1.1B-Chat-v0.4-q4f16_1-android.tar --overrides="context_window_size=768;sliding_window_size=768"

It seems for some reason the else branch is executed here: https://github.com/mlc-ai/mlc-llm/blob/0e7ee203228334301423867ac54f53a1575ed9da/cpp/llm_chat.cc#L614-L624

Suall1969 on Feb 12, 2024

Thank @Suall1969! Yes, followed your instructions, and confirmed TinyLlama 0.4 works. Will try other models.

You’re welcome. I also tested llama, gpt_neox, mistral, phi_msft, and gemma which all worked on my Pixel 7. Here’s the relevant script:

# Compile model libraries
curl --create-dirs -o dist/Llama-2-7b-chat-hf-q4f16_1-MLC/mlc-chat-config.json https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/raw/main/mlc-chat-config.json
curl --create-dirs -o dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/mlc-chat-config.json https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/raw/main/mlc-chat-config.json
curl --create-dirs -o dist/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/mlc-chat-config.json https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/raw/main/mlc-chat-config.json
curl --create-dirs -o dist/phi-2-q4f16_1-MLC/mlc-chat-config.json https://huggingface.co/mlc-ai/phi-2-q4f16_1-MLC/raw/main/mlc-chat-config.json
curl --create-dirs -o dist/TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC/mlc-chat-config.json https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC/raw/main/mlc-chat-config.json
curl --create-dirs -o dist/gemma-2b-it-q4f16_1-MLC/mlc-chat-config.json https://huggingface.co/mlc-ai/gemma-2b-it-q4f16_1-MLC/raw/main/mlc-chat-config.json
mkdir dist/libs
mlc_chat compile ./dist/Llama-2-7b-chat-hf-q4f16_1-MLC/mlc-chat-config.json --device android -o ./dist/libs/Llama-2-7b-chat-hf-q4f16_1-android.tar --overrides="context_window_size=768;sliding_window_size=768"
mlc_chat compile ./dist/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/mlc-chat-config.json --device android -o ./dist/libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-android.tar --overrides="context_window_size=768;sliding_window_size=768"
mlc_chat compile ./dist/Mistral-7B-Instruct-v0.2-q4f16_1-MLC/mlc-chat-config.json --device android -o ./dist/libs/Mistral-7B-Instruct-v0.2-q4f16_1-android.tar --overrides="context_window_size=768;sliding_window_size=768"
mlc_chat compile ./dist/phi-2-q4f16_1-MLC/mlc-chat-config.json --device android -o ./dist/libs/phi-2-q4f16_1-android.tar --overrides="context_window_size=768;sliding_window_size=768"
mlc_chat compile ./dist/TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC/mlc-chat-config.json --device android --system-lib-prefix tinyllama_q4f16_1_ -o ./dist/libs/TinyLlama-1.1B-Chat-v0.4-q4f16_1-android.tar --overrides="context_window_size=768;sliding_window_size=768"
mlc_chat compile ./dist/gemma-2b-it-q4f16_1-MLC/mlc-chat-config.json --device android -o ./dist/libs/gemma-2b-it-q4f16_1-android.tar --overrides="context_window_size=768;sliding_window_size=768"

# Configure list of models
cd ./android/library
echo '{"model_list":[{"model_url":"https://huggingface.co/mlc-ai/Llama-2-7b-chat-hf-q4f16_1-MLC/","model_lib":"llama_q4f16_1","estimated_vram_bytes":4348727787,"model_id":"Llama-2-7b-chat-hf-q4f16_1"},{"model_url":"https://huggingface.co/mlc-ai/RedPajama-INCITE-Chat-3B-v1-q4f16_1-MLC/","model_lib":"gpt_neox_q4f16_1","estimated_vram_bytes":1948348579,"model_id":"RedPajama-INCITE-Chat-3B-v1-q4f16_1"},{"model_url":"https://huggingface.co/mlc-ai/Mistral-7B-Instruct-v0.2-q4f16_1-MLC","model_lib":"mistral_q4f16_1","estimated_vram_bytes":4275453296,"model_id":"Mistral-7B-Instruct-v0.2-q4f16_1"},{"model_url":"https://huggingface.co/mlc-ai/phi-2-q4f16_1-MLC","model_lib":"phi-msft_q4f16_1","estimated_vram_bytes":2036816936,"model_id":"phi-2-q4f16_1"},{"model_url":"https://huggingface.co/mlc-ai/TinyLlama-1.1B-Chat-v0.4-q4f16_1-MLC","model_lib":"tinyllama_q4f16_1","estimated_vram_bytes":709733022,"model_id":"TinyLlama-1.1B-Chat-v0.4-q4f16_1"},{"model_url":"https://huggingface.co/mlc-ai/gemma-2b-it-q4f16_1-MLC","model_lib":"gemma_q4f16_1","estimated_vram_bytes":3000000000,"model_id":"gemma-2b-it-q4f16_1"}],"model_lib_path_for_prepare_libs":{"llama_q4f16_1":"libs/Llama-2-7b-chat-hf-q4f16_1-android.tar","gpt_neox_q4f16_1":"libs/RedPajama-INCITE-Chat-3B-v1-q4f16_1-android.tar","phi-msft_q4f16_1":"libs/phi-2-q4f16_1-android.tar","Mistral-7B-Instruct-v0.2-q4f16_1":"libs/Mistral-7B-Instruct-v0.2-q4f16_1-android.tar","TinyLlama-1.1B-Chat-v0.4-q4f16_1":"libs/TinyLlama-1.1B-Chat-v0.4-q4f16_1-android.tar","gemma-2b-it-q4f16_1":"libs/gemma-2b-it-q4f16_1-android.tar"}}' > ./src/main/assets/app-config.json

Suall1969 on Feb 24, 2024

I can confirm that TinyLlama v0.4 now works when building with the script that I provided in https://github.com/mlc-ai/mlc-llm/issues/1741#issuecomment-1944566496.

Suall1969 on Feb 23, 2024

@phicolzhang Good day, try clean app cache, restart phone and redownload Llama.

BlindDeveloper on Feb 21, 2024