localGPT: AssertionError: Torch not compiled with CUDA enabled
I have PyTorch with CUDA enabled:
# Name Version Build Channel
pytorch 2.0.1 py3.11_cuda11.8_cudnn8_0 pytorch
pytorch-cuda 11.8 h24eeafa_5 pytorch
pytorch-mutex 1.0 cuda pytorch
This error message needs improvement. What is the actual problem?
requirements.txt
needs to be updated to include the correct pytorch version?
OS Name Microsoft Windows 10 Pro Version 10.0.19045 Build 19045
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 29 (5 by maintainers)
What worked for me: Windows 10, 3080 RTX, Cuda 12.1, python 3.10, pip 23.1.2, running inside virtual env in Powershell installed everything like in told in the tutorial: pip install -r requirements.txt
–check if cuda is installed, should be visible top right corner
–check if torch is installed
–clean up
–go to this site an get the propper command for your system an cuda installation –https://pytorch.org/get-started/locally/
–check again
–try to ingest python ingest.py
I’m rather new in understanding the intricacies of machine models and embedding, but here I get to learn a few stuff 😃
In the end I was able to run the project by using the base instructor (using my GPU) during ingest.py, but running localGPT.py using CPU.
This brings me to the next problem of having too little RAM, as the Vicuna-7B takes 30GB load of my 32GB RAM (not GPU VRAM, btw). Though it works, the questions are really slow. I haven’t tried PromtEngineer’s comment about setting the RAM there, (as I’m not sure yet what the argument actually does), because Vicuna-7B afaik, is 30GB, and if it’s limited to something smaller, like 5 GB, would probably not work as intended.
A colleague of mine helped me using his machine with dual RTX2080TI, with 12 GB VRAM each, and he was able to run the ingest.py and run_localGPT.py with no issue, though he did show me that when the runLocalGPT.py was run, both of his GPUs are maintaining a 9 GB load.
tl;dr: Vicuna 7B and the large instructor doesn’t work without at least a 20 GB VRAM GPU in total. The Embedding (ingest.py) would still work if I were to use the base instructor, but the actual model execution doesn’t.
yeah, same here. conda on windows cpu.
@endolith I will have a look at the code and see what is causing this.
I tried this and it worked, thanks for defining it step by step!
But now I have the next problem…I was able to ingest the document, but I couldn’t run it due to
I tried changing the EMBEDDING_MODEL_NAME to instructor-base, but it seems it’s not yet small enough.
I have an RTX3070, so only 8 GB VRAM unfortunately…anyone know what instructor model would work? or something else that might be done?
BTW @PromtEngineer: the current code checks for CUDA explicitly for full models, which makes it unusable for MPS: https://github.com/PromtEngineer/localGPT/blob/main/run_localGPT.py#L68
@ChristianWeyer I finally got a M2 and just tested it, that is the case. Need to figure out if there is another way.
@ChristianWeyer this seems to be a bug, thanks for highlighting it. I am not sure if auto_gptq supports M1/M2. Will need to test that.
I fixed this issue by installing requirements.txt through
conda
. I’ll summarize the steps I followed below:Install
conda
for your platform here: https://docs.conda.io/projects/conda/en/latest/user-guide/install/index.htmlThen I ran the following commands from the YouTube video:
If it worked, the output of
torch.cuda.is_available()
should beTrue
.