AutoGPTQ: [BUG] FileNotFoundError: Could not find model in TheBloke/WizardLM-*-uncensored-GPTQ
Describe the bug Unable to load model directly from the repository using the example in README.md:
Software version
Operating System: MacOS 13.3.1 CUDA Toolkit: None Python: Python 3.10.11 AutoGPTQ: 0.2.1 PyTorch: 2.1.0.dev20230520 Transformers: 4.30.0.dev0 Accelerate: 0.20.0.dev0
To Reproduce Running this script causes the error:
from transformers import AutoTokenizer, TextGenerationPipeline
from auto_gptq import AutoGPTQForCausalLM
MODEL = "TheBloke/WizardLM-7B-uncensored-GPTQ"
import logging
logging.basicConfig(
format="%(asctime)s %(levelname)s [%(name)s] %(message)s", level=logging.INFO, datefmt="%Y-%m-%d %H:%M:%S"
)
# device = "cuda:0"
device = "mps"
tokenizer = AutoTokenizer.from_pretrained(MODEL, use_fast=True)
# download quantized model from Hugging Face Hub and load to the first GPU
model = AutoGPTQForCausalLM.from_quantized(MODEL,
device=device,
use_safetensors=True,
use_triton=False)
# inference with model.generate
print(tokenizer.decode(model.generate(**tokenizer("auto_gptq is", return_tensors="pt").to(model.device))[0]))
Expected behavior I expect it to be downloaded from Hugging Face and run like specified in README.
Screenshots If applicable, add screenshots to help explain your problem. Error:
python scripts/auto-gptq-test.py
Downloading (…)lve/main/config.json: 100%|███████████████████████████| 552/552 [00:00<00:00, 1.08MB/s]
Downloading (…)quantize_config.json: 100%|██████████████████████████| 57.0/57.0 [00:00<00:00, 175kB/s]
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/luke/dev/tg-app/scripts/auto-gptq-test.py:19 in <module> │
│ │
│ 16 │
│ 17 tokenizer = AutoTokenizer.from_pretrained(MODEL, use_fast=True) │
│ 18 # download quantized model from Hugging Face Hub and load to the first GPU │
│ ❱ 19 model = AutoGPTQForCausalLM.from_quantized(MODEL, │
│ 20 │ │ # model_name_or_path="WizardLM-13B-Uncensored-GPTQ-4bit.act-order", │
│ 21 │ │ device=device, │
│ 22 │ │ use_safetensors=True, │
│ │
│ /opt/homebrew/lib/python3.10/site-packages/auto_gptq/modeling/auto.py:82 in from_quantized │
│ │
│ 79 │ │ model_type = check_and_get_model_type(save_dir or model_name_or_path, trust_remo │
│ 80 │ │ quant_func = GPTQ_CAUSAL_LM_MODEL_MAP[model_type].from_quantized │
│ 81 │ │ keywords = {key: kwargs[key] for key in signature(quant_func).parameters if key │
│ ❱ 82 │ │ return quant_func( │
│ 83 │ │ │ model_name_or_path=model_name_or_path, │
│ 84 │ │ │ save_dir=save_dir, │
│ 85 │ │ │ device_map=device_map, │
│ │
│ /opt/homebrew/lib/python3.10/site-packages/auto_gptq/modeling/_base.py:698 in from_quantized │
│ │
│ 695 │ │ │ │ │ break │
│ 696 │ │ │
│ 697 │ │ if resolved_archive_file is None: # Could not find a model file to use │
│ ❱ 698 │ │ │ raise FileNotFoundError(f"Could not find model in {model_name_or_path}") │
│ 699 │ │ │
│ 700 │ │ model_save_name = resolved_archive_file │
│ 701 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
FileNotFoundError: Could not find model in TheBloke/WizardLM-7B-uncensored-GPTQ
Additional context
I’ve also tried providing model_name_or_path
as noted in https://github.com/PanQiWei/AutoGPTQ/pull/91
MODEL_FILE = "WizardLM-7B-uncensored-GPTQ-4bit-128g.compat.no-act-order"
model = AutoGPTQForCausalLM.from_quantized(MODEL,
model_name_or_path=MODEL_FILE,
device=device,
use_safetensors=True,
use_triton=False)
But then I get the following:
python scripts/auto-gptq-test.py
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/luke/dev/tg-app/scripts/auto-gptq-test.py:19 in <module> │
│ │
│ 16 │
│ 17 tokenizer = AutoTokenizer.from_pretrained(MODEL, use_fast=True) │
│ 18 # download quantized model from Hugging Face Hub and load to the first GPU │
│ ❱ 19 model = AutoGPTQForCausalLM.from_quantized(MODEL, │
│ 20 │ │ model_name_or_path=MODEL_FILE, │
│ 21 │ │ device=device, │
│ 22 │ │ use_safetensors=True, │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: AutoGPTQForCausalLM.from_quantized() got multiple values for argument 'model_name_or_path'
Perhaps @TheBloke you could chime in 😃
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 25 (5 by maintainers)
Yeah you need
model_basename
. Most of my models (all except the recent Falcon ones, which were made with AutoGPTQ) use a custom model name. You need to tell AutoGPTQ what this is.This can be specified with eg:
I was going to extend
quantize_config.json
to list this name so that HF Hub download could handle it automatically. But I’ve not had time to look at it yet, I’ve been so busy with models and support.This code will work:
Output:
However
You can’t use AutoGPTQ with device=‘mps’. Only NVidia CUDA GPUs are supported.
It may work to run on CPU only, but it will be very very slow.
Ahh… it’s Not a Bug My Friend. Just pass the repo id to
model_name_or_path
and MODEL_FILE tomodel_basename
param.It’ll Be Solved by Now. Feel free to close this Issue after Solving.
remove
model_basename
or set its value tomodel
. The safetensors file is now calledmodel.safetensors
, and ‘model’ is now set asmodel_basename
inquantize_config.json
so you don’t need to passmodel_basename
to.from_quantized()
any more