autotrain-advanced: Finetuned Model does not have a config.json
When we finetune a llm using auto-trained advanced, it does not store a config.json which makes it difficult to load.
It has the following files
README.md
adapter_config.json
adapter_model.bin
optimizer.pt
pytorch_model.bin
Mo_state.pth
scheduler.pt
special_tokens_map json
tokenizerjson
tokenizer_configjson
trainer_state.json
training_args.bin
So when I load it using pipeline, or by default class, it fails.
i.e
# use pipeline to check
import torch
from transformers import pipeline
dolly_llm = pipeline(model="/content/dolly_proj/dolly_v2/checkpoint-150", torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto")
Error:
OSError: dolly_v2/checkpoint-225 does not appear to have a file named config.json. Checkout 'https://huggingface.co/dolly_v2/checkpoint-225/None' for available files.
Or
import torch
from instruct_pipeline import InstructionTextGenerationPipeline
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("dolly_v2/checkpoint-150", padding_side="left")
model = AutoModelForCausalLM.from_pretrained("dolly_v2/checkpoint-150", device_map="auto", torch_dtype=torch.bfloat16)
generate_text = InstructionTextGenerationPipeline(model=model, tokenizer=tokenizer)
Error:
OSError: dolly_v2/checkpoint-225 does not appear to have a file named config.json. Checkout 'https://huggingface.co/dolly_v2/checkpoint-225/None' for available files.
Training Command:
!autotrain llm --train --project_name dolly_v2 --model databricks/dolly-v2-3b --data_path . --use_peft --use_int4 --learning_rate 2e-4 --train_batch_size 6 --num_train_epochs 3 --trainer sft
It should have the config.json by default
Manually adding config.json and instruct_pipeline from the dollyv2 repo and then loading gives the following warning:
Some weights of GPTNeoXForCausalLM were not initialized from the model checkpoint at /dolly_v2/checkpoint-225 and are newly initialized: ['layers.12.mlp.dense_4h_to_h.bias', 'layers.0.attention.query_key_value.weight', 'layers.5.post_attention_layernorm.bias', 'layers.16.attention.dense.bias', 'layers.16.mlp.dense_h_to_4h.weight', 'layers.9.attention.query_key_value.bias', 'layers.3.mlp.dense_h_to_4h.weight', 'layers.9.post_attention_layernorm.bias', 'layers.1.attention.dense.weight', 'layers.22.attention.dense.bias', 'layers.8.attention.query_key_value.bias', 'layers.5.mlp.dense_4h_to_h.weight', 'layers.13.input_layernorm.weight', 'layers.14.attention.query_key_value.bias', 'layers.30.mlp.dense_h_to_4h.bias', 'layers.20.mlp.dense_h_to_4h.bias', 'layers.22.mlp.dense_h_to_4h.bias', 'layers.3.attention.rotary_emb.inv_freq', 'layers.22.mlp.dense_4h_to_h.bias', 'layers.23.input_layernorm.bias', 'layers.9.attention.dense.bias', 'layers.7.input_layernorm.bias', 'layers.19.mlp.dense_4h_to_h.weight', 'layers.2.attention.dense.weight', 'layers.2.input_layernorm.weight', 'layers.0.input_layernorm.bias', 'layers.25.post_attention_layernorm.bias', 'layers.6.attention.query_key_value.weight', 'layers.27.post_attention_layernorm.weight', 'layers.24.mlp.dense_h_to_4h.weight', 'layers.13.mlp.dense_h_to_4h.bias', 'layers.17.mlp.dense_4h_to_h.weight', 'layers.16.attention.dense.weight', 'layers.6.attention.query_key_value.bias', 'layers.14.post_attention_layernorm.weight', 'layers.11.mlp.dense_h_to_4h.weight', 'layers.14.mlp.dense_4h_to_h.weight', 'layers.28.post_attention_layernorm.bias', 'layers.27.attention.dense.weight', 'layers.25.input_layernorm.bias', 'layers.11.mlp.dense_h_to_4h.bias', 'layers.6.mlp.dense_4h_to_h.bias', 'layers.5.attention.query_key_value.bias', 'layers.21.attention.dense.bias', 'layers.8.post_attention_layernorm.weight', 'layers.17.attention.dense.bias', 'layers.29.mlp.dense_4h_to_h.weight', 'layers.15.mlp.dense_h_to_4h.bias', 'layers.7.input_layernorm.weight', 'layers.4.input_layernorm.bias', 'layers.20.attention.dense.bias', 'layers.15.post_attention_layernorm.weight', 'layers.23.mlp.dense_h_to_4h.weight', 'layers.3.mlp.dense_4h_to_h.weight', 'layers.4.attention.dense.weight', 'layers.12.mlp.dense_4h_to_h.weight', 'layers.27.mlp.dense_h_to_4h.weight', 'layers.3.post_attention_layernorm.weight', 'layers.28.attention.dense.bias', 'layers.23.post_attention_layernorm.bias', 'layers.23.mlp.dense_h_to_4h.bias', 'layers.0.attention.dense.bias', 'layers.10.post_attention_layernorm.bias', 'layers.24.attention.query_key_value.bias', 'layers.26.post_attention_layernorm.bias', 'layers.18.attention.dense.weight', 'layers.31.input_layernorm.bias', 'layers.16.input_layernorm.weight', 'layers.12.mlp.dense_h_to_4h.weight', 'layers.13.attention.dense.bias', 'layers.9.attention.dense.weight', 'layers.22.mlp.dense_h_to_4h.weight', 'layers.8.attention.dense.bias', 'layers.25.attention.query_key_value.bias', 'layers.12.input_layernorm.weight', 'layers.16.post_attention_layernorm.bias', 'layers.19.attention.query_key_value.bias', 'layers.0.input_layernorm.weight', 'layers.26.input_layernorm.weight', 'layers.5.input_layernorm.weight', 'layers.24.post_attention_layernorm.bias', 'layers.17.post_attention_layernorm.bias', 'layers.3.attention.dense.weight', 'layers.6.attention.dense.weight', 'layers.8.input_layernorm.bias', 'layers.17.attention.rotary_emb.inv_freq', 'layers.25.mlp.dense_4h_to_h.weight', 'layers.22.input_layernorm.bias', 'layers.12.post_attention_layernorm.bias', 'layers.21.mlp.dense_h_to_4h.weight', 'layers.10.attention.query_key_value.weight', 'layers.2.input_layernorm.bias', 'layers.28.input_layernorm.bias', 'layers.1.post_attention_layernorm.bias', 'layers.27.mlp.dense_4h_to_h.bias', 'layers.0.post_attention_layernorm.bias', 'layers.13.input_layernorm.bias', 'layers.28.attention.query_key_value.weight', 'layers.20.attention.dense.weight', 'layers.2.mlp.dense_h_to_4h.bias', 'layers.4.post_attention_layernorm.bias', 'layers.20.attention.query_key_value.weight', 'layers.23.attention.query_key_value.bias', 'layers.21.attention.dense.weight', 'layers.21.attention.query_key_value.bias', 'layers.1.attention.rotary_emb.inv_freq', 'layers.11.attention.rotary_emb.inv_freq', 'layers.18.mlp.dense_4h_to_h.bias', 'layers.6.attention.rotary_emb.inv_freq', 'layers.13.post_attention_layernorm.bias', 'layers.0.mlp.dense_h_to_4h.bias', 'layers.26.post_attention_layernorm.weight', 'layers.10.mlp.dense_h_to_4h.bias', 'layers.16.mlp.dense_4h_to_h.weight', 'layers.20.post_attention_layernorm.bias', 'layers.30.post_attention_layernorm.weight', 'layers.12.attention.dense.weight', 'layers.3.attention.dense.bias', 'layers.28.mlp.dense_h_to_4h.bias', 'layers.16.attention.query_key_value.weight', 'layers.26.mlp.dense_4h_to_h.weight', 'layers.19.post_attention_layernorm.weight', 'layers.12.mlp.dense_h_to_4h.bias', 'layers.1.input_layernorm.bias', 'layers.26.mlp.dense_4h_to_h.bias', 'layers.12.attention.query_key_value.weight', 'layers.24.mlp.dense_h_to_4h.bias', 'layers.30.attention.query_key_value.weight', 'layers.1.mlp.dense_h_to_4h.bias', 'layers.19.input_layernorm.bias', 'layers.31.input_layernorm.weight', 'layers.3.attention.query_key_value.weight', 'layers.23.attention.query_key_value.weight', 'layers.23.attention.rotary_emb.inv_freq', 'layers.3.mlp.dense_h_to_4h.bias', 'layers.24.post_attention_layernorm.weight', 'layers.28.mlp.dense_4h_to_h.weight', 'layers.17.attention.dense.weight', 'layers.0.attention.rotary_emb.inv_freq', 'layers.23.input_layernorm.weight', 'layers.24.attention.query_key_value.weight', 'layers.8.attention.rotary_emb.inv_freq', 'layers.22.input_layernorm.weight', 'layers.10.post_attention_layernorm.weight', 'layers.18.post_attention_layernorm.weight', 'layers.8.attention.query_key_value.weight', 'layers.31.mlp.dense_h_to_4h.bias', 'layers.5.attention.rotary_emb.inv_freq', 'layers.7.attention.query_key_value.bias', 'layers.5.input_layernorm.bias', 'layers.10.mlp.dense_h_to_4h.weight', 'layers.11.input_layernorm.bias', 'layers.7.mlp.dense_4h_to_h.bias', 'layers.19.mlp.dense_h_to_4h.weight', 'layers.4.input_layernorm.weight', 'layers.13.attention.rotary_emb.inv_freq', 'layers.20.post_attention_layernorm.weight', 'layers.17.mlp.dense_4h_to_h.bias', 'layers.4.mlp.dense_h_to_4h.weight', 'layers.15.post_attention_layernorm.bias', 'layers.0.attention.query_key_value.bias', 'layers.24.attention.dense.weight', 'layers.1.mlp.dense_h_to_4h.weight', 'layers.15.input_layernorm.weight', 'layers.28.mlp.dense_4h_to_h.bias', 'layers.30.attention.dense.bias', 'layers.7.attention.query_key_value.weight', 'layers.9.attention.rotary_emb.inv_freq', 'layers.9.attention.query_key_value.weight', 'layers.28.input_layernorm.weight', 'layers.1.attention.dense.bias', 'layers.3.post_attention_layernorm.bias', 'layers.25.mlp.dense_h_to_4h.bias', 'layers.6.mlp.dense_h_to_4h.weight', 'layers.26.input_layernorm.bias', 'layers.21.mlp.dense_4h_to_h.bias', 'layers.19.input_layernorm.weight', 'layers.14.input_layernorm.bias', 'layers.8.input_layernorm.weight', 'layers.19.attention.dense.weight', 'layers.6.input_layernorm.bias', 'layers.31.attention.query_key_value.weight', 'layers.26.attention.rotary_emb.inv_freq', 'layers.13.mlp.dense_4h_to_h.bias', 'layers.13.mlp.dense_h_to_4h.weight', 'layers.22.mlp.dense_4h_to_h.weight', 'layers.30.mlp.dense_4h_to_h.bias', 'layers.24.mlp.dense_4h_to_h.weight', 'layers.27.mlp.dense_h_to_4h.bias', 'layers.24.mlp.dense_4h_to_h.bias', 'layers.9.mlp.dense_h_to_4h.weight', 'layers.7.attention.dense.bias', 'layers.2.post_attention_layernorm.weight', 'layers.5.attention.dense.bias', 'layers.20.mlp.dense_4h_to_h.weight', 'layers.15.attention.query_key_value.weight', 'layers.7.mlp.dense_h_to_4h.bias', 'layers.0.mlp.dense_4h_to_h.weight', 'layers.11.attention.dense.bias', 'layers.7.attention.dense.weight', 'layers.16.mlp.dense_h_to_4h.bias', 'layers.29.mlp.dense_h_to_4h.weight', 'layers.28.attention.query_key_value.bias', 'layers.9.input_layernorm.bias', 'layers.15.input_layernorm.bias', 'layers.11.input_layernorm.weight', 'layers.22.attention.query_key_value.bias', 'layers.22.attention.query_key_value.weight', 'layers.26.attention.dense.weight', 'layers.2.mlp.dense_4h_to_h.weight', 'layers.15.attention.dense.weight', 'layers.26.mlp.dense_h_to_4h.weight', 'layers.31.post_attention_layernorm.bias', 'layers.19.mlp.dense_h_to_4h.bias', 'layers.17.post_attention_layernorm.weight', 'layers.30.attention.query_key_value.bias', 'layers.29.input_layernorm.bias', 'layers.18.mlp.dense_h_to_4h.bias', 'layers.21.mlp.dense_h_to_4h.bias', 'layers.25.post_attention_layernorm.weight', 'layers.14.attention.dense.weight', 'layers.15.attention.rotary_emb.inv_freq', 'layers.21.input_layernorm.weight', 'layers.0.mlp.dense_h_to_4h.weight', 'layers.12.attention.dense.bias', 'layers.2.attention.rotary_emb.inv_freq', 'layers.11.attention.query_key_value.bias', 'layers.30.mlp.dense_h_to_4h.weight', 'layers.29.post_attention_layernorm.weight', 'layers.18.attention.query_key_value.weight', 'layers.14.attention.query_key_value.weight', 'layers.7.attention.rotary_emb.inv_freq', 'layers.12.attention.rotary_emb.inv_freq', 'layers.19.attention.rotary_emb.inv_freq', 'layers.21.mlp.dense_4h_to_h.weight', 'layers.8.mlp.dense_h_to_4h.weight', 'layers.27.attention.rotary_emb.inv_freq', 'layers.7.mlp.dense_4h_to_h.weight', 'layers.8.mlp.dense_4h_to_h.bias', 'layers.14.input_layernorm.weight', 'layers.2.mlp.dense_4h_to_h.bias', 'layers.17.mlp.dense_h_to_4h.bias', 'layers.4.mlp.dense_4h_to_h.weight', 'layers.15.mlp.dense_4h_to_h.bias', 'layers.31.attention.dense.bias', 'layers.19.mlp.dense_4h_to_h.bias', 'layers.22.attention.rotary_emb.inv_freq', 'layers.8.mlp.dense_4h_to_h.weight', 'layers.10.attention.dense.bias', 'layers.19.attention.query_key_value.weight', 'layers.3.input_layernorm.bias', 'layers.10.input_layernorm.bias', 'layers.25.input_layernorm.weight', 'layers.31.attention.dense.weight', 'layers.23.attention.dense.bias', 'layers.19.attention.dense.bias', 'layers.20.mlp.dense_h_to_4h.weight', 'layers.21.post_attention_layernorm.bias', 'layers.3.attention.query_key_value.bias', 'layers.0.attention.dense.weight', 'layers.28.attention.dense.weight', 'layers.30.attention.rotary_emb.inv_freq', 'layers.8.post_attention_layernorm.bias', 'layers.6.input_layernorm.weight', 'layers.16.attention.query_key_value.bias', 'layers.15.attention.dense.bias', 'layers.30.attention.dense.weight', 'layers.17.attention.query_key_value.bias', 'layers.31.mlp.dense_4h_to_h.bias', 'layers.25.mlp.dense_4h_to_h.bias', 'layers.24.input_layernorm.bias', 'layers.26.attention.query_key_value.weight', 'layers.29.attention.rotary_emb.inv_freq', 'layers.7.post_attention_layernorm.weight', 'layers.24.attention.rotary_emb.inv_freq', 'layers.13.post_attention_layernorm.weight', 'layers.29.mlp.dense_4h_to_h.bias', 'layers.17.input_layernorm.bias', 'layers.13.mlp.dense_4h_to_h.weight', 'layers.20.input_layernorm.bias', 'layers.0.post_attention_layernorm.weight', 'layers.31.mlp.dense_h_to_4h.weight', 'layers.14.mlp.dense_4h_to_h.bias', 'layers.27.post_attention_layernorm.bias', 'layers.18.attention.rotary_emb.inv_freq', 'layers.29.input_layernorm.weight', 'layers.24.attention.dense.bias', 'layers.7.mlp.dense_h_to_4h.weight', 'layers.18.mlp.dense_4h_to_h.weight', 'layers.21.post_attention_layernorm.weight', 'layers.30.input_layernorm.bias', 'layers.25.attention.query_key_value.weight', 'layers.1.mlp.dense_4h_to_h.weight', 'layers.8.attention.dense.weight', 'layers.10.mlp.dense_4h_to_h.weight', 'layers.1.post_attention_layernorm.weight', 'layers.2.attention.dense.bias', 'layers.15.mlp.dense_h_to_4h.weight', 'layers.29.attention.query_key_value.weight', 'layers.9.mlp.dense_4h_to_h.bias', 'layers.14.mlp.dense_h_to_4h.weight', 'layers.2.attention.query_key_value.bias', 'layers.29.mlp.dense_h_to_4h.bias', 'layers.25.mlp.dense_h_to_4h.weight', 'layers.2.mlp.dense_h_to_4h.weight', 'layers.25.attention.dense.weight', 'layers.25.attention.dense.bias', 'layers.10.input_layernorm.weight', 'layers.28.mlp.dense_h_to_4h.weight', 'layers.5.attention.dense.weight', 'layers.4.mlp.dense_4h_to_h.bias', 'embed_in.weight', 'layers.5.attention.query_key_value.weight', 'layers.4.attention.query_key_value.bias', 'layers.4.attention.dense.bias', 'layers.1.attention.query_key_value.bias', 'layers.4.mlp.dense_h_to_4h.bias', 'layers.31.post_attention_layernorm.weight', 'layers.28.post_attention_layernorm.weight', 'layers.12.attention.query_key_value.bias', 'layers.29.attention.query_key_value.bias', 'layers.31.attention.rotary_emb.inv_freq', 'layers.31.attention.query_key_value.bias', 'layers.6.attention.dense.bias', 'layers.3.mlp.dense_4h_to_h.bias', 'layers.10.mlp.dense_4h_to_h.bias', 'layers.22.post_attention_layernorm.bias', 'layers.16.post_attention_layernorm.weight', 'layers.9.mlp.dense_4h_to_h.weight', 'layers.16.input_layernorm.bias', 'layers.17.attention.query_key_value.weight', 'layers.11.attention.dense.weight', 'layers.23.post_attention_layernorm.weight', 'layers.13.attention.query_key_value.weight', 'layers.10.attention.rotary_emb.inv_freq', 'layers.25.attention.rotary_emb.inv_freq', 'layers.16.mlp.dense_4h_to_h.bias', 'layers.11.mlp.dense_4h_to_h.weight', 'layers.13.attention.dense.weight', 'embed_out.weight', 'layers.12.post_attention_layernorm.weight', 'layers.20.attention.query_key_value.bias', 'layers.21.input_layernorm.bias', 'layers.18.attention.dense.bias', 'layers.30.input_layernorm.weight', 'layers.26.attention.query_key_value.bias', 'layers.1.attention.query_key_value.weight', 'layers.26.mlp.dense_h_to_4h.bias', 'layers.5.mlp.dense_h_to_4h.weight', 'layers.22.post_attention_layernorm.weight', 'layers.6.post_attention_layernorm.bias', 'layers.6.mlp.dense_h_to_4h.bias', 'layers.19.post_attention_layernorm.bias', 'layers.17.input_layernorm.weight', 'layers.23.mlp.dense_4h_to_h.weight', 'layers.9.input_layernorm.weight', 'layers.20.mlp.dense_4h_to_h.bias', 'layers.20.attention.rotary_emb.inv_freq', 'layers.27.attention.dense.bias', 'layers.12.input_layernorm.bias', 'layers.26.attention.dense.bias', 'layers.22.attention.dense.weight', 'layers.0.mlp.dense_4h_to_h.bias', 'layers.18.input_layernorm.bias', 'layers.23.mlp.dense_4h_to_h.bias', 'layers.3.input_layernorm.weight', 'layers.5.post_attention_layernorm.weight', 'layers.14.post_attention_layernorm.bias', 'layers.27.attention.query_key_value.weight', 'layers.27.mlp.dense_4h_to_h.weight', 'layers.2.post_attention_layernorm.bias', 'layers.29.attention.dense.bias', 'layers.14.attention.dense.bias', 'layers.8.mlp.dense_h_to_4h.bias', 'layers.4.post_attention_layernorm.weight', 'layers.14.mlp.dense_h_to_4h.bias', 'layers.11.post_attention_layernorm.bias', 'layers.23.attention.dense.weight', 'final_layer_norm.weight', 'layers.1.mlp.dense_4h_to_h.bias', 'layers.18.mlp.dense_h_to_4h.weight', 'layers.14.attention.rotary_emb.inv_freq', 'layers.21.attention.rotary_emb.inv_freq', 'layers.13.attention.query_key_value.bias', 'layers.10.attention.query_key_value.bias', 'layers.2.attention.query_key_value.weight', 'layers.18.post_attention_layernorm.bias', 'layers.20.input_layernorm.weight', 'layers.28.attention.rotary_emb.inv_freq', 'layers.16.attention.rotary_emb.inv_freq', 'layers.6.mlp.dense_4h_to_h.weight', 'layers.4.attention.rotary_emb.inv_freq', 'layers.11.attention.query_key_value.weight', 'layers.6.post_attention_layernorm.weight', 'layers.29.attention.dense.weight', 'layers.27.attention.query_key_value.bias', 'layers.11.mlp.dense_4h_to_h.bias', 'layers.5.mlp.dense_4h_to_h.bias', 'layers.11.post_attention_layernorm.weight', 'layers.10.attention.dense.weight', 'layers.15.attention.query_key_value.bias', 'layers.29.post_attention_layernorm.bias', 'layers.17.mlp.dense_h_to_4h.weight', 'layers.4.attention.query_key_value.weight', 'layers.18.attention.query_key_value.bias', 'layers.7.post_attention_layernorm.bias', 'layers.18.input_layernorm.weight', 'layers.27.input_layernorm.weight', 'layers.30.mlp.dense_4h_to_h.weight', 'final_layer_norm.bias', 'layers.15.mlp.dense_4h_to_h.weight', 'layers.1.input_layernorm.weight', 'layers.21.attention.query_key_value.weight', 'layers.9.post_attention_layernorm.weight', 'layers.31.mlp.dense_4h_to_h.weight', 'layers.5.mlp.dense_h_to_4h.bias', 'layers.30.post_attention_layernorm.bias', 'layers.24.input_layernorm.weight', 'layers.9.mlp.dense_h_to_4h.bias', 'layers.27.input_layernorm.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
And generates all random text.
About this issue
- Original URL
- State: closed
- Created a year ago
- Comments: 15 (5 by maintainers)
if you merge the weights, config is available. if not, you can use config from the main model
@TapendraBaduwal I wont advice using autotrain with free colab. Instead go for Finetuning using BitsandBytes and SFT with code. https://colab.research.google.com/drive/134o_cXcMe_lsvl15ZE_4Y75Kstepsntu?usp=sharing See this notebook.
What can be done to avoid it crashing at this stage on Google Colab? It appears that it is running out of Memory Loading the checkpoint shards.
The issue turned out to be because of limited capacity of the free colab where the script was getting killed by colab. The script ran successfully on High ram V100 with colab pro