jetson-containers: AssertionError in TVM CUDA Initialization

Hi,

I’m just trying test out Live LLaVA using the following command:

./run.sh \
  -e SSL_KEY=/data/key.pem -e SSL_CERT=/data/cert.pem \
  $(./autotag local_llm) \
    python3 -m local_llm.agents.video_query --api=mlc --verbose \
      --model liuhaotian/llava-v1.5-7b \
      --max-new-tokens 32 \
      --video-input /dev/video0 \
      --video-output display://0 \
      --prompt "How many fingers am I holding up?"

However, it throws the following error from local_llm.agents.video_query module about some assertion error:

/usr/local/lib/python3.8/dist-packages/transformers/utils/hub.py:124: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
09:41:03 | DEBUG | Namespace(api='mlc', chat_template=None, debug=True, do_sample=False, log_level='debug', max_new_tokens=32, min_new_tokens=-1, model='liuhaotian/llava-v1.5-7b', prompt=['How many fingers am I holding up?'], quant=None, repetition_penalty=1.0, save_mermaid=None, system_prompt=None, temperature=0.7, top_p=0.95, video_input='v4l2:///dev/video0', video_input_codec=None, video_input_framerate=None, video_input_height=None, video_input_save=None, video_input_width=None, video_output='display://0', video_output_bitrate=None, video_output_codec=None, video_output_save=None, vision_model=None)
09:41:03 | DEBUG | subprocess 694 started
09:41:03 | DEBUG | RUN_PROCESS GIRDI...
09:41:03 | DEBUG | Starting new HTTPS connection (1): huggingface.co:443
09:41:03 | DEBUG | https://huggingface.co:443 "GET /api/models/liuhaotian/llava-v1.5-7b/revision/main HTTP/1.1" 200 2276
Fetching 11 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 71089.90it/s]
09:41:03 | INFO | loading /data/models/huggingface/models--liuhaotian--llava-v1.5-7b/snapshots/12e054b30e8e061f423c7264bc97d4248232e965 with MLC
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/opt/local_llm/local_llm/agents/video_query.py", line 115, in <module>
    agent = VideoQuery(**vars(args)).run() 
  File "/opt/local_llm/local_llm/agents/video_query.py", line 22, in __init__
    self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
  File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 34, in __init__
Traceback (most recent call last):
    raise RuntimeError(f"subprocess has an invalid initialization status ({init_msg['status']})")
  File "/usr/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
RuntimeError: subprocess has an invalid initialization status (<class 'AssertionError'>)
  File "/usr/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 66, in run_process
    raise error
  File "/opt/local_llm/local_llm/plugins/process_proxy.py", line 63, in run_process
    plugin = factory(**kwargs)
  File "/opt/local_llm/local_llm/agents/video_query.py", line 22, in <lambda>
    self.llm = ProcessProxy((lambda **kwargs: ChatQuery(model, drop_inputs=True, **kwargs)), **kwargs)
  File "/opt/local_llm/local_llm/plugins/chat_query.py", line 63, in __init__
    self.model = LocalLM.from_pretrained(model, **kwargs)
  File "/opt/local_llm/local_llm/local_llm.py", line 72, in from_pretrained
    model = MLCModel(model_path, **kwargs)
  File "/opt/local_llm/local_llm/models/mlc.py", line 58, in __init__
    assert(self.device.exist) # this is needed to initialize CUDA?
AssertionError

What would be the reason for this error? Thanks.

About this issue

Original URL
State: open
Created 5 months ago
Comments: 15 (6 by maintainers)

Most upvoted comments

@leon-seidel @doruksonmez try changing this line to the following:

https://github.com/dusty-nv/jetson-containers/blob/2d6187b00eaad34a4a51bf1e088baf4a600faa09/packages/llm/local_llm/agents/video_query.py#L22

self.llm = ChatQuery(model, drop_inputs=True, **kwargs)

And when you start the container, mount your local copy of the code into the container like so:

./run.sh \
  -v /mnt/NVME/jetson-containers/packages/llm/local_llm:/opt/local_llm/local_llm \
  $(./autotag local_llm)

(then any code changes you make to local_llm package will be reflected inside the container without needing to rebuild it)

dusty-nv on Feb 14, 2024

I’m having the same issue (can not see the video via webrtc or X). However, I was able to work around it using video-viewer. container video output: --video-output rtsp://@:1234/output \ on the host: video-viewer.py rtsp://localhost:1234/output display://0

TadayukiOkada on Feb 26, 2024