autotrain-advanced: [BUG]size mismatch for base_model.model.model.embed_tokens.weight
Prerequisites
- I have read the documentation.
- I have checked other issues for similar problems.
Backend
Colab
Interface Used
CLI
CLI Command
!autotrain llm
–train
–model ${MODEL_NAME}
–project-name ${PROJECT_NAME}
–data-path data/
–text-column text
–lr ${LEARNING_RATE}
–batch-size ${BATCH_SIZE}
–epochs ${NUM_EPOCHS}
–block-size ${BLOCK_SIZE}
–warmup-ratio ${WARMUP_RATIO}
–lora-r ${LORA_R}
–lora-alpha ${LORA_ALPHA}
–lora-dropout ${LORA_DROPOUT}
–weight-decay ${WEIGHT_DECAY}
–gradient-accumulation ${GRADIENT_ACCUMULATION}
–quantization ${QUANTIZATION}
–mixed-precision ${MIXED_PRECISION}
$( [[ “$MERGE_ADAPTER” == “True” ]] && echo “–merge_adapter” )
$( [[ “$PEFT” == “True” ]] && echo “–peft” )
$( [[ “$PUSH_TO_HUB” == “True” ]] && echo “–push-to-hub --token ${HF_TOKEN} --repo-id ${REPO_ID}” )
UI Screenshots & Parameters
Error Logs
{‘train_runtime’: 138.1913, ‘train_samples_per_second’: 1.65, ‘train_steps_per_second’: 0.203, ‘train_loss’: 1.3923424993242537, ‘epoch’: 0.98}
100% 28/28 [02:18<00:00, 4.94s/it]
🚀 INFO | 2024-02-03 05:42:13 | main🚋477 - Finished training, saving model…
🚀 INFO | 2024-02-03 05:42:16 | main🚋488 - Merging adapter weights…
🚀 INFO | 2024-02-03 05:42:16 | autotrain.trainers.clm.utils:merge_adapter:192 - Loading adapter…
Loading checkpoint shards: 100% 10/10 [00:30<00:00, 3.10s/it]
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:392: UserWarning: do_sample is set to False. However, temperature is set to 0.9 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:397: UserWarning: do_sample is set to False. However, top_p is set to 0.6 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
⚠️ WARNING | 2024-02-03 05:42:48 | main🚋500 - Failed to merge adapter weights: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
⚠️ WARNING | 2024-02-03 05:42:48 | main🚋501 - Skipping adapter merge. Only adapter weights will be saved.
🚀 INFO | 2024-02-03 05:42:49 | main🚋510 - Pushing model to hub…
adapter_model.safetensors: 0% 0.00/1.08G [00:00<?, ?B/s]
adapter_model.safetensors: 0% 0.00/1.08G [00:00<?, ?B/s]
Additional Information
When running the example colab notebook for autotrain LLM with --merge_adapter, it failed because size mismatch.
🚀 INFO | 2024-02-03 05:42:16 | main🚋488 - Merging adapter weights…
🚀 INFO | 2024-02-03 05:42:16 | autotrain.trainers.clm.utils:merge_adapter:192 - Loading adapter…
Loading checkpoint shards: 100% 10/10 [00:30<00:00, 3.10s/it]
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:392: UserWarning: do_sample is set to False. However, temperature is set to 0.9 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
/usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:397: UserWarning: do_sample is set to False. However, top_p is set to 0.6 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed.
warnings.warn(
⚠️ WARNING | 2024-02-03 05:42:48 | main🚋500 - Failed to merge adapter weights: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]).
⚠️ WARNING | 2024-02-03 05:42:48 | main🚋501 - Skipping adapter merge. Only adapter weights will be saved.
About this issue
- Original URL
- State: closed
- Created 5 months ago
- Comments: 17 (4 by maintainers)
this issue is being investigated. thank you for reporting it with version details. we thank you for your patience while we resolve it.