gpt4all: Run on GPU - can't import GPT4AllGPU

Your instructions on how to run it on GPU are not working for me:

# rungptforallongpu.py

import torch
from transformers import LlamaTokenizer

from nomic.gpt4all import GPT4AllGPU # this fails, copy/pasted that class into this script

LLAMA_PATH = "F:\\GPT4ALLGPU\\llama\\llama-7b-hf"
LLAMA_TOKENIZER_PATH = "F:\\GPT4ALLGPU\\llama\\llama-tokenizer"

tokenizer = LlamaTokenizer.from_pretrained(LLAMA_TOKENIZER_PATH)

m = GPT4AllGPU(LLAMA_PATH)
config = {'num_beams': 2,
      'min_new_tokens': 10,
      'max_length': 100,
      'repetition_penalty': 2.0}
out = m.generate('write me a story about a lonely computer', config)
print(out)

from nomic.gpt4all import GPT4AllGPU fails

ImportError: cannot import name 'GPT4AllGPU' from 'nomic.gpt4all' (F:\GPT4ALLGPU\nomic\nomic\gpt4all\__init__.py)

(I can import the GPT4All class from that file OK, so I know my path is correct). If I copy/paste the GPT4allGPU class into my own python script file that seems to fix that.

Could you suggest a compatible Llama 7B model, and a compatible llama tokenizer pretrained file? It seems to expect both, but I think the random ones I’m using my not be working? Is this like with Stable Diffusion wherea textual inversion has to be trained on that exact model for them to work together, or should any Llama 7B model work with any Llama 7b pretrained tokenizer like I did here?

I tried cloning https://huggingface.co/decapoda-research/llama-7b-hf as my LLAMA_PATH. (And tweaking 2 json files that had it spelled as “LLaMA” and not “Llama”)

And cloned https://huggingface.co/HuggingFaceM4/llama-7b-tokenizer/tree/main and set that as my LLAMA_TOKENIZER_PATH

When I run my rungptforallongpu.py script is says: “Loading checkpoint shards” which completes successfully (100% 33/33), but then it fails on a ZeroDivisionError: integer division or modulo by zero on m = GPT4AllGPU(LLAMA_PATH) where it’s complaining about Python310\lib\site-packages\peft\peft_model.py:167 in from_pretrained and then site-packages\accelerate\utils\modeling.py get_balanced_memory

 We can't just set the memory to model_size // num_devices as it will end being too  
 slightly less layers and some layers will end up offload at the end. So this funct

So I’m probably doing something wrong, hope someone can tell me what…

Could you in general make this part of the ReadMe instructions a bit clearer please?

The CPU version is running fine via >gpt4all-lora-quantized-win64.exe (but a little slow and the PC fan is going nuts), so I’d like to use my GPU if I can - and then figure out how I can custom train this thing 😃.

  • Win11
  • Torch 2.0.0
  • CUDA 11.7 (I confirmed that torch can see CUDA)
  • Python 3.10.10
  • 8GB GeForce 3070
  • 32GB RAM

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 5
  • Comments: 24

Most upvoted comments

Try from nomic.gpt4all.gpt4all import GPT4AllGPU The information in the readme is incorrect I believe.

Bless your heart @benninkcorien explaining to three eight year olds in a trench coat that just because a response is structured like constructive feedback doesn’t actually make it sound advice. Fine tune and validate your results with a third party resource, boys and girls.

@loanmaster, I express appreciation of you for your suggestion to use this line that HTTPSConnectionPool(host=‘github.com/nomic-ai//gpt4all/issues/159’, port=443): Post interrupted.

I’m sorry to hear that you’re having trouble running the GPT4AllGPU script on your GPU. Here are some suggestions that may help … If you continue to experience issues, please let me know and I’ll do my best to assist you further.

Ok, just out of curiosity is your response AI generated? No offense intended! I mean it as a complement - it’s very long and very polite. Almost too polite… 😄

There is another bug on offload_folder that says you need to specify it.

Very buggy code

the tokenizer is not used anywhere. the self.lora_path points to nowhere. This is depressing man.

@benninkcorien Looks like you’re way ahead of me at trying to run this on a GPU. Where did you find this rungptforallongpu.py file? It’s not in either the gtp4all or nomic repos.

It’s in benninkcorien’s first post:

Your instructions on how to run it on GPU are not working for me:

    # rungptforallongpu.py

To replicate, copy the Python code included in benninkcorien’s post and paste it into a file named rungptforallongpu.py

On a related matter, I’m not sure how far we can get, given the unresolvable, hard-coded LoRA path in the nomic code - which, by the looks of it, could be a reference to a subsequently-removed nomic-ai section on huggingface

        self.lora_path = 'nomic-ai/vicuna-lora-multi-turn_epoch_2'

I don’t think we can run GPT4all on GPU on Windows at this point. It seems to require the deepspeed package.

And you can not install deepspeed on Windows, because while they say it is partially supported, when you try to build the wheel in the dist folder as per their instructions, that errors out with fatal error LNK1181: cannot open input file 'aio.lib' and that seems to be caused by the aio lib which is not supported on Windows.

I’m giving up. CPU it is.

Thank you for your answer!

I have all pip packages that are required installed

(I think torch 2.0.0 is actually the latest version, see also https://pytorch.org/ ?) And that’s the version of PyTorch that works with the CUDA that my GPU supports so I’d rather not change that.

PyTorch version: 2.0.0+cu117


Can Torch see Cuda? True _CUDA version:


CUDNN version: 8500 Available GPU devices: 1 Device Name: NVIDIA GeForce RTX 3070

I’ll try redownloading models/tokenizer with the transformers-cli (but I’ll wait for someone to confirm that model/tokenizer should actually work).