mlc-llm: ROCM HIP Error: shared object initialization failed

🐛 Bug

sample_mlc_chat.py errors out after a while with: TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed

To Reproduce

Steps to reproduce the behavior:

Install ROCm 5.6
Follow get started instructions
Download models and amend sample code
Run python sample_mlc_chat.py

System automatically detected device: rocm
Using model folder: /home/corv/Downloads/mlc/dist/prebuilt/mlc-chat-Llama-2-13b-chat-hf-q4f16_1
Using mlc chat config: /home/corv/Downloads/mlc/dist/prebuilt/mlc-chat-Llama-2-13b-chat-hf-q4f16_1/mlc-chat-config.json
Using library model: /home/corv/Downloads/mlc/dist/prebuilt/lib/Llama-2-13b-chat-hf-q4f16_1-rocm.so

Traceback (most recent call last):
  File "/home/corv/Downloads/mlc/sample_mlc_chat.py", line 8, in <module>
    cm = ChatModule(model="Llama-2-13b-chat-hf-q4f16_1")
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Traceback (most recent call last):
  File "/home/corv/Downloads/mlc/sample_mlc_chat.py", line 12, in <module>
    output = cm.generate(
             ^^^^^^^^^^^^
  File "/home/corv/.pyenv/versions/3.11.4/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 641, in generate
    self._prefill(prompt)
  File "/home/corv/.pyenv/versions/3.11.4/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 810, in _prefill
    self._prefill_func(input, decode_next_token, place_in_prompt.value)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 262, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 251, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  10: TVMFuncCall
  9: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /workspace/mlc-llm/cpp/llm_chat.cc:1083
  8: mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt)
        at /workspace/mlc-llm/cpp/llm_chat.cc:611
  7: mlc::llm::LLMChat::ForwardTokens(std::vector<int, std::allocator<int> >, long)
        at /workspace/mlc-llm/cpp/llm_chat.cc:836
  6: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  4: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
  3: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
  2: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
  1: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  0: _ZN3tvm7runtime6deta
  4: TVMFuncCall
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::detail::PackFuncPackedArg_<0, tvm::runtime::ROCMWrappedFunc>(tvm::runtime::ROCMWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::runtime::ROCMWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void*, unsigned long) const [clone .isra.0]
  1: tvm::runtime::ROCMModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/rocm/rocm_module.cc", line 105
  File "/workspace/tvm/src/runtime/library_module.cc", line 87
TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed

Expected behavior

Expecting it to work as well as mlc_chat_cli

Environment

Platform (e.g. WebGPU/Vulkan/IOS/Android/CUDA): ROCm
Operating system (e.g. Ubuntu/Windows/MacOS/…): Arch Linux
Device (e.g. iPhone 12 Pro, PC+RTX 3090, …) Radeon 6800XT
How you installed MLC-LLM (conda, source): conda
How you installed TVM-Unity (pip, source): pip
Python version (e.g. 3.10): 3.11.4
GPU driver version (if applicable):
CUDA/cuDNN version (if applicable):
TVM Unity Hash Tag (python -c "import tvm; print('\n'.join(f'{k}: {v}' for k, v in tvm.support.libinfo().items()))", applicable if you compile models):
Any other relevant information: using pyenv / venv

About this issue

Original URL
State: closed
Created a year ago
Comments: 21 (8 by maintainers)

Most upvoted comments

I tried to use -mcpu=gfx1030, and it turned out that the compiled lib is not runnable on 7900 XTX, reporting the same ROCM HIP Error: hipModuleLoadData(...) error.

MasterJH5574 on Aug 15, 2023

@MasterJH5574 could you share a quick command line/process you used to re-compile the TVM module with -mcpu=gfx1030 ?

I’m about to do that myself so i can run MLC models on my 6800xt. Would love a pointer or two if you were successfully able to re-compile the module needed.

Thanks in advance!

peacepenguin on Aug 19, 2023

This is good news! If I understand you correctly, the model files are built to be GPU device family dependent and we simply need to “recompile” them?

yes that’s right.

masahi on Aug 14, 2023

Ok I was able to reproduce this issue. I think this happens because Llama-2-7b-chat-hf-q4f16_1-rocm.so that comes from https://github.com/mlc-ai/binary-mlc-llm-libs.git was built for RDNA 3 (gfx 1100). So obviously it only works for the cards from that generation.

masahi on Aug 14, 2023

ROCm is not supported on 6800XT I guess

It does work on RX 6000 series although they have never been officially supported. Like other said, it is likely a TVM-specific issue (e.g. we might need bitcode update).

I can try to reproduce this on my rx 6600xt.

masahi on Aug 13, 2023

How come exllama and co work on this older card? This is the first time I’ve run into issues with it.

Also, someone mentioned a workaround thru Vulkan, how do I switch bindings?

./mlc_chat_cli --local-id Llama-2-7b-chat-hf-q4f16_1 --device vulkan

cm = ChatModule(model="Llama-2-7b-chat-hf-q4f16_1", device="vulkan")

HTH

robertswiecki on Aug 12, 2023

I tried to follow the steps on another machine with 7900 XTX, and unfortunately was not able to reproduce the issue 😦

I noticed that the devices above are all not the latest generation, and am not sure if this is the reason behind. I don’t have available device to test right now.

On the ROCm installation, I tried both

sudo amdgpu-install --usecase=rocm

and

sudo amdgpu-install --usecase=hiplibsdk,rocm

as listed in https://docs.amd.com/en/docs-5.6.0/deploy/linux/installer/install.html, and both of them work on my side.

MasterJH5574 on Aug 11, 2023

Same with both sample_mlc_chat.py and with mlc_chat_cli - vulkan works well though

Debian 13 - x86_64 Kernel 6.4 rocm 5.6 GPU: RX 6800XT CPU: amd 5950X

$ python sample_mlc_chat.py 
System automatically detected device: rocm
Using model folder: /home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1
Using mlc chat config: /home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/mlc-chat-config.json
Using library model: /home/user/src/mlc/dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-rocm.so

Traceback (most recent call last):
  File "/home/user/src/mlc/sample_mlc_chat.py", line 12, in <module>
    output = cm.generate(
             ^^^^^^^^^^^^
  File "/home/user/.local/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 650, in generate
    self._prefill(prompt)
  File "/home/user/.local/lib/python3.11/site-packages/mlc_chat/chat_module.py", line 819, in _prefill
    self._prefill_func(input, decode_next_token, place_in_prompt.value)
  File "tvm/_ffi/_cython/./packed_func.pxi", line 331, in tvm._ffi._cy3.core.PackedFuncBase.__call__
  File "tvm/_ffi/_cython/./packed_func.pxi", line 262, in tvm._ffi._cy3.core.FuncCall
  File "tvm/_ffi/_cython/./packed_func.pxi", line 251, in tvm._ffi._cy3.core.FuncCall3
  File "tvm/_ffi/_cython/./base.pxi", line 181, in tvm._ffi._cy3.core.CHECK_CALL
tvm._ffi.base.TVMError: Traceback (most recent call last):
  10: TVMFuncCall
  9: mlc::llm::LLMChatModule::GetFunction(tvm::runtime::String const&, tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#5}::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /workspace/mlc-llm/cpp/llm_chat.cc:1083
  8: mlc::llm::LLMChat::PrefillStep(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool, bool, mlc::llm::PlaceInPrompt)
        at /workspace/mlc-llm/cpp/llm_chat.cc:611
  7: mlc::llm::LLMChat::ForwardTokens(std::vector<int, std::allocator<int> >, long)
        at /workspace/mlc-llm/cpp/llm_chat.cc:836
  6: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  5: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::relax_vm::VirtualMachineImpl::GetClosureInternal(tvm::runtime::String const&, bool)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  4: tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)
  3: tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()
  2: tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)
  1: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::WrapPackedFunc(int (*)(TVMValue*, int*, int, TVMValue*, int*, void*), tvm::runtime::ObjectPtr<tvm::runtime::Object> const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  0: _ZN3tvm7runtime6deta
  4: TVMFuncCall
  3: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::detail::PackFuncPackedArg_<0, tvm::runtime::ROCMWrappedFunc>(tvm::runtime::ROCMWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)
  2: tvm::runtime::ROCMWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*, void*, unsigned long) const [clone .isra.0]
  1: tvm::runtime::ROCMModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)
  0: _ZN3tvm7runtime6deta
  File "/workspace/tvm/src/runtime/rocm/rocm_module.cc", line 105
  File "/workspace/tvm/src/runtime/library_module.cc", line 87
TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed

$ ./mlc_chat_cli --local-id Llama-2-7b-chat-hf-q4f16_1 --device rocm
Use MLC config: "/home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/mlc-chat-config.json"
Use model weights: "/home/user/src/mlc/dist/prebuilt/mlc-chat-Llama-2-7b-chat-hf-q4f16_1/ndarray-cache.json"
Use model library: "/home/user/src/mlc/dist/prebuilt/lib/Llama-2-7b-chat-hf-q4f16_1-rocm.so"
You can use the following special commands:
  /help               print the special commands
  /exit               quit the cli
  /stats              print out the latest stats (token/sec)
  /reset              restart a fresh chat
  /reload [local_id]  reload model `local_id` from disk, or reload the current model if `local_id` is not specified

Loading model...
Loading finished
Running system prompts...
[19:46:45] /home/user/src/mlc/mlc-llm/3rdparty/tvm/src/runtime/library_module.cc:87: TVMError: ROCM HIP Error: hipModuleLoadData(&(module_[device_id]), data_.c_str()) failed with error: shared object initialization failed
Stack trace:
  File "/home/user/src/mlc/mlc-llm/3rdparty/tvm/src/runtime/rocm/rocm_module.cc", line 105
  [bt] (0) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x13) [0x7fe81b712b83]
  [bt] (1) ./mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x24) [0x564de84d6ae4]
  [bt] (2) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x216cb4) [0x7fe81b816cb4]
  [bt] (3) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::ROCMModuleNode::GetFunc(int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&)+0x13e) [0x7fe81b8199be]
  [bt] (4) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x216e36) [0x7fe81b816e36]
  [bt] (5) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::detail::PackFuncPackedArg_<0, tvm::runtime::ROCMWrappedFunc>(tvm::runtime::ROCMWrappedFunc, std::vector<tvm::runtime::detail::ArgConvertCode, std::allocator<tvm::runtime::detail::ArgConvertCode> > const&)::{lambda(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0xda) [0x7fe81b819b9a]
  [bt] (6) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(TVMFuncCall+0x46) [0x7fe81b6df156]

Stack trace:
  [bt] (0) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::Backtrace[abi:cxx11]()+0x13) [0x7fe81b712b83]
  [bt] (1) ./mlc_chat_cli(tvm::runtime::detail::LogFatal::Entry::Finalize()+0x24) [0x564de84d6ae4]
  [bt] (2) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x10f404) [0x7fe81b70f404]
  [bt] (3) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x10f5a0) [0x7fe81b70f5a0]
  [bt] (4) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunInstrCall(tvm::runtime::relax_vm::VMFrame*, tvm::runtime::relax_vm::Instruction)+0x8c0) [0x7fe81b78ff30]
  [bt] (5) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::RunLoop()+0x2c7) [0x7fe81b78cbd7]
  [bt] (6) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeBytecode(long, std::vector<tvm::runtime::TVMRetValue, std::allocator<tvm::runtime::TVMRetValue> > const&)+0x24d) [0x7fe81b78d06d]
  [bt] (7) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(+0x18d455) [0x7fe81b78d455]
  [bt] (8) /home/user/src/mlc/mlc-llm/build/tvm/libtvm_runtime.so(tvm::runtime::relax_vm::VirtualMachineImpl::InvokeClosurePacked(tvm::runtime::ObjectRef const&, tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x277) [0x7fe81b78b787]

robertswiecki on Aug 12, 2023