LLaMA-Factory: ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`
Reminder
- I have read the README and searched the existing issues.
Reproduction
Issue #2420 did not fix as of c901aa6.
Full error output
[2024-03-12 08:50:54,244] torch.distributed.elastic.multiprocessing.redirects: [WARNING] NOTE: Redirects are currently not supported in Windows or MacOs.
[INFO|tokenization_utils_base.py:2044] 2024-03-12 08:52:37,843 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2044] 2024-03-12 08:52:37,843 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2044] 2024-03-12 08:52:37,844 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2044] 2024-03-12 08:52:37,844 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2044] 2024-03-12 08:52:37,844 >> loading file tokenizer.json
[INFO|configuration_utils.py:726] 2024-03-12 08:52:37,905 >> loading configuration file C:\LLaMA-Factory\checkpoints\Llama-2-13b-chat-hf\config.json
[INFO|configuration_utils.py:791] 2024-03-12 08:52:37,906 >> Model config LlamaConfig {
"_name_or_path": "C:\\LLaMA-Factory\\checkpoints\\Llama-2-13b-chat-hf",
"architectures": [
"LlamaForCausalLM"
],
"attention_bias": false,
"attention_dropout": 0.0,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 5120,
"initializer_range": 0.02,
"intermediate_size": 13824,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 40,
"num_hidden_layers": 40,
"num_key_value_heads": 40,
"pretraining_tp": 1,
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.38.1",
"use_cache": true,
"vocab_size": 32000
}
[INFO|modeling_utils.py:3254] 2024-03-12 08:52:37,931 >> loading weights file C:\LLaMA-Factory\checkpoints\Llama-2-13b-chat-hf\model.safetensors.index.json
[INFO|modeling_utils.py:1400] 2024-03-12 08:52:37,931 >> Instantiating LlamaForCausalLM model under default dtype torch.float16.
[INFO|configuration_utils.py:845] 2024-03-12 08:52:37,932 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2
}
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 3/3 [00:09<00:00, 3.01s/it]
[INFO|modeling_utils.py:3992] 2024-03-12 08:52:47,320 >> All model checkpoint weights were used when initializing LlamaForCausalLM.
[INFO|modeling_utils.py:4000] 2024-03-12 08:52:47,320 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at C:\LLaMA-Factory\checkpoints\Llama-2-13b-chat-hf.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:798] 2024-03-12 08:52:47,321 >> loading configuration file C:\LLaMA-Factory\checkpoints\Llama-2-13b-chat-hf\generation_config.json
C:\LLaMA-Factory\venv\lib\site-packages\transformers\generation\configuration_utils.py:410: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
C:\LLaMA-Factory\venv\lib\site-packages\transformers\generation\configuration_utils.py:415: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
C:\LLaMA-Factory\venv\lib\site-packages\transformers\generation\configuration_utils.py:410: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.9` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
C:\LLaMA-Factory\venv\lib\site-packages\transformers\generation\configuration_utils.py:415: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
[INFO|configuration_utils.py:845] 2024-03-12 08:52:47,322 >> Generate config GenerationConfig {
"bos_token_id": 1,
"eos_token_id": 2,
"max_length": 4096,
"pad_token_id": 0,
"temperature": 0.9,
"top_p": 0.6
}
Traceback (most recent call last):
File "C:\LLaMA-Factory\venv\lib\site-packages\gradio\queueing.py", line 407, in call_prediction
output = await route_utils.call_process_api(
File "C:\LLaMA-Factory\venv\lib\site-packages\gradio\route_utils.py", line 226, in call_process_api
output = await app.get_blocks().process_api(
File "C:\LLaMA-Factory\venv\lib\site-packages\gradio\blocks.py", line 1550, in process_api
result = await self.call_function(
File "C:\LLaMA-Factory\venv\lib\site-packages\gradio\blocks.py", line 1199, in call_function
prediction = await utils.async_iteration(iterator)
File "C:\LLaMA-Factory\venv\lib\site-packages\gradio\utils.py", line 519, in async_iteration
return await iterator.__anext__()
File "C:\LLaMA-Factory\venv\lib\site-packages\gradio\utils.py", line 512, in __anext__
return await anyio.to_thread.run_sync(
File "C:\LLaMA-Factory\venv\lib\site-packages\anyio\to_thread.py", line 33, in run_sync
return await get_async_backend().run_sync_in_worker_thread(
File "C:\LLaMA-Factory\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 2106, in run_sync_in_worker_thread
return await future
File "C:\LLaMA-Factory\venv\lib\site-packages\anyio\_backends\_asyncio.py", line 833, in run
result = context.run(func, *args)
File "C:\LLaMA-Factory\venv\lib\site-packages\gradio\utils.py", line 495, in run_sync_iterator_async
return next(iterator)
File "C:\LLaMA-Factory\venv\lib\site-packages\gradio\utils.py", line 649, in gen_wrapper
yield from f(*args, **kwargs)
File "C:\LLaMA-Factory\src\llmtuner\webui\components\export.py", line 71, in save_model
export_model(args)
File "C:\LLaMA-Factory\src\llmtuner\train\tuner.py", line 52, in export_model
model, tokenizer = load_model_and_tokenizer(model_args, finetuning_args)
File "C:\LLaMA-Factory\src\llmtuner\model\loader.py", line 150, in load_model_and_tokenizer
model = load_model(tokenizer, model_args, finetuning_args, is_trainable, add_valuehead)
File "C:\LLaMA-Factory\src\llmtuner\model\loader.py", line 94, in load_model
model = init_adapter(model, model_args, finetuning_args, is_trainable)
File "C:\LLaMA-Factory\src\llmtuner\model\adapter.py", line 110, in init_adapter
model: "LoraModel" = PeftModel.from_pretrained(model, adapter)
File "C:\LLaMA-Factory\venv\lib\site-packages\peft\peft_model.py", line 353, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "C:\LLaMA-Factory\venv\lib\site-packages\peft\peft_model.py", line 727, in load_adapter
dispatch_model(
File "C:\LLaMA-Factory\venv\lib\site-packages\accelerate\big_modeling.py", line 374, in dispatch_model
raise ValueError(
ValueError: We need an `offload_dir` to dispatch this model according to this `device_map`, the following submodules need to be offloaded: base_model.model.model.layers.33, base_model.model.model.layers.34, base_model.model.model.layers.35, base_model.model.model.layers.36, base_model.model.model.layers.37, base_model.model.model.layers.38, base_model.model.model.layers.39, base_model.model.model.norm, base_model.model.lm_head.
Expected behavior
- Start WebUI in
Exporttab. - Selected Adapter/QLoRA-4bit train on
Llama2-13Bwith set byModel path - Set
Export dirto empty folder and [Export]
System Info
system info
transformersversion: 4.38.1- Platform: Windows-11
- Python version: 3.10.8
- Huggingface_hub version: 0.20.1
- Safetensors version: 0.4.2
- Accelerate version: 0.27.2
- Accelerate config: not found
- PyTorch version (GPU?): 2.1.2+cu121 (True)
- Single GPU 24GB VRAM
Others
hoi hiyouga!
About this issue
- Original URL
- State: closed
- Created 4 months ago
- Comments: 16 (16 by maintainers)
Commits related to this issue
- fix #2802 — committed to leixy76/LLaMA-Factory by hiyouga 4 months ago
- fix #2802 — committed to hiyouga/LLaMA-Factory by hiyouga 4 months ago
- fix #2802 — committed to sanjay920/LLaMA-Factory by hiyouga 4 months ago
- fix #2802 — committed to sanjay920/LLaMA-Factory by hiyouga 4 months ago
We have made some fix, please try again