LLaVA: Error running with --num-gpus 2
I’m trying to run the LLaVA on two RTX 4090 GPUs for inference. The model loads onto the GPUs without any issues, but an error occurs at inference time when I run the sample example from the Gradio web interface.
Here is the error:
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! (when checking argument for argument tensors in method wrapper_CUDA_cat)
The error seems to be caused by tensors being on different GPUs.
Environment
OS: Ubuntu
Python version: 3.10
CUDA version: 11.8
GPU model: Dual RTX 4090s
Steps to reproduce:
python -m llava.serve.controller --host 0.0.0.0 --port 10000
python3 -m llava.serve.model_worker --controller http://localhost:10000 --port 40000 --worker http://localhost:40000 --model-path /home/lukas/Desktop/models/llava --num-gpus 2 --multi-modal
python -m llava.serve.gradio_web_server --controller http://localhost:10000
Run the sample example from the Gradio web interface
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 20 (8 by maintainers)
Commits related to this issue
- Fix #20,#81. — committed to haotian-liu/LLaVA by haotian-liu a year ago
- Fix #20,#81. — committed to hyj1991/LLaVA by haotian-liu a year ago
Hi, thank you for your interest in our work. We are working on the multiple GPU support, and will try to provide a solution today. Thanks.
There’s nothing different about the weights between the versions as I understand it, I’m not sure if v1 and v2 is the proper terms for it. There have been multiple versions of the hugging face weights floating around with a different config files. The weights I originally applied the deltas to I downloaded back in early March which seem incompatible with your deltas for whatever reason.