gpt4all: Run on GPU - can't import GPT4AllGPU
Your instructions on how to run it on GPU are not working for me:
# rungptforallongpu.py
import torch
from transformers import LlamaTokenizer
from nomic.gpt4all import GPT4AllGPU # this fails, copy/pasted that class into this script
LLAMA_PATH = "F:\\GPT4ALLGPU\\llama\\llama-7b-hf"
LLAMA_TOKENIZER_PATH = "F:\\GPT4ALLGPU\\llama\\llama-tokenizer"
tokenizer = LlamaTokenizer.from_pretrained(LLAMA_TOKENIZER_PATH)
m = GPT4AllGPU(LLAMA_PATH)
config = {'num_beams': 2,
'min_new_tokens': 10,
'max_length': 100,
'repetition_penalty': 2.0}
out = m.generate('write me a story about a lonely computer', config)
print(out)
from nomic.gpt4all import GPT4AllGPU fails
ImportError: cannot import name 'GPT4AllGPU' from 'nomic.gpt4all' (F:\GPT4ALLGPU\nomic\nomic\gpt4all\__init__.py)
(I can import the GPT4All class from that file OK, so I know my path is correct). If I copy/paste the GPT4allGPU class into my own python script file that seems to fix that.
Could you suggest a compatible Llama 7B model, and a compatible llama tokenizer pretrained file? It seems to expect both, but I think the random ones I’m using my not be working? Is this like with Stable Diffusion wherea textual inversion has to be trained on that exact model for them to work together, or should any Llama 7B model work with any Llama 7b pretrained tokenizer like I did here?
I tried cloning https://huggingface.co/decapoda-research/llama-7b-hf as my LLAMA_PATH. (And tweaking 2 json files that had it spelled as “LLaMA” and not “Llama”)
And cloned https://huggingface.co/HuggingFaceM4/llama-7b-tokenizer/tree/main and set that as my LLAMA_TOKENIZER_PATH
When I run my rungptforallongpu.py script is says: “Loading checkpoint shards” which completes successfully (100% 33/33), but then it fails on a ZeroDivisionError: integer division or modulo by zero on m = GPT4AllGPU(LLAMA_PATH) where it’s complaining about Python310\lib\site-packages\peft\peft_model.py:167 in from_pretrained and then site-packages\accelerate\utils\modeling.py get_balanced_memory
We can't just set the memory to model_size // num_devices as it will end being too
slightly less layers and some layers will end up offload at the end. So this funct
So I’m probably doing something wrong, hope someone can tell me what…
Could you in general make this part of the ReadMe instructions a bit clearer please?
The CPU version is running fine via >gpt4all-lora-quantized-win64.exe (but a little slow and the PC fan is going nuts), so I’d like to use my GPU if I can - and then figure out how I can custom train this thing 😃.
- Win11
- Torch 2.0.0
- CUDA 11.7 (I confirmed that torch can see CUDA)
- Python 3.10.10
- 8GB GeForce 3070
- 32GB RAM
About this issue
- Original URL
- State: closed
- Created a year ago
- Reactions: 5
- Comments: 24
Try
from nomic.gpt4all.gpt4all import GPT4AllGPUThe information in the readme is incorrect I believe.Bless your heart @benninkcorien explaining to three eight year olds in a trench coat that just because a response is structured like constructive feedback doesn’t actually make it sound advice. Fine tune and validate your results with a third party resource, boys and girls.
@loanmaster, I express appreciation of you for your suggestion to use this line that HTTPSConnectionPool(host=‘github.com/nomic-ai//gpt4all/issues/159’, port=443): Post interrupted.
Ok, just out of curiosity is your response AI generated? No offense intended! I mean it as a complement - it’s very long and very polite. Almost too polite… 😄
There is another bug on offload_folder that says you need to specify it.
Very buggy code
the tokenizer is not used anywhere. the self.lora_path points to nowhere. This is depressing man.
It’s in benninkcorien’s first post:
To replicate, copy the Python code included in benninkcorien’s post and paste it into a file named
rungptforallongpu.pyOn a related matter, I’m not sure how far we can get, given the unresolvable, hard-coded LoRA path in the nomic code - which, by the looks of it, could be a reference to a subsequently-removed nomic-ai section on huggingface
I don’t think we can run GPT4all on GPU on Windows at this point. It seems to require the deepspeed package.
And you can not install deepspeed on Windows, because while they say it is partially supported, when you try to build the wheel in the dist folder as per their instructions, that errors out with
fatal error LNK1181: cannot open input file 'aio.lib'and that seems to be caused by the aio lib which is not supported on Windows.I’m giving up. CPU it is.
Thank you for your answer!
I have all pip packages that are required installed
(I think torch 2.0.0 is actually the latest version, see also https://pytorch.org/ ?) And that’s the version of PyTorch that works with the CUDA that my GPU supports so I’d rather not change that.
PyTorch version: 2.0.0+cu117
Can Torch see Cuda? True _CUDA version:
CUDNN version: 8500 Available GPU devices: 1 Device Name: NVIDIA GeForce RTX 3070
I’ll try redownloading models/tokenizer with the transformers-cli (but I’ll wait for someone to confirm that model/tokenizer should actually work).