LLaVA: Error running with --num-gpus 2

I’m trying to run the LLaVA on two RTX 4090 GPUs for inference. The model loads onto the GPUs without any issues, but an error occurs at inference time when I run the sample example from the Gradio web interface.

Here is the error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)

The error seems to be caused by tensors being on different GPUs.

Environment

OS: Ubuntu
Python version: 3.10
CUDA version: 11.8
GPU model: Dual RTX 4090s

Steps to reproduce:

python -m llava.serve.controller --host 0.0.0.0 --port 10000
python3 -m llava.serve.model_worker --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path /home/lukas/Desktop/models/llava --num-gpus 2 --multi-modal
python -m llava.serve.gradio_web_server --controller http://localhost:10000
Run the sample example from the Gradio web interface

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 20 (8 by maintainers)

Commits related to this issue

Most upvoted comments

Hi, thank you for your interest in our work. We are working on the multiple GPU support, and will try to provide a solution today. Thanks.

There’s nothing different about the weights between the versions as I understand it, I’m not sure if v1 and v2 is the proper terms for it. There have been multiple versions of the hugging face weights floating around with a different config files. The weights I originally applied the deltas to I downloaded back in early March which seem incompatible with your deltas for whatever reason.