autotrain-advanced: [BUG]size mismatch for base_model.model.model.embed_tokens.weight

Prerequisites

I have read the documentation.
I have checked other issues for similar problems.

Backend

Colab

Interface Used

CLI

CLI Command

!autotrain llm
–train
–model ${MODEL_NAME}
–project-name ${PROJECT_NAME}
–data-path data/
–text-column text
–lr ${LEARNING_RATE}
–batch-size ${BATCH_SIZE}
–epochs ${NUM_EPOCHS}
–block-size ${BLOCK_SIZE}
–warmup-ratio ${WARMUP_RATIO}
–lora-r ${LORA_R}
–lora-alpha ${LORA_ALPHA}
–lora-dropout ${LORA_DROPOUT}
–weight-decay ${WEIGHT_DECAY}
–gradient-accumulation ${GRADIENT_ACCUMULATION}
–quantization ${QUANTIZATION}
–mixed-precision ${MIXED_PRECISION}
$( [[ “$MERGE_ADAPTER” == “True” ]] && echo “–merge_adapter” )
$( [[ “$PEFT” == “True” ]] && echo “–peft” )
$( [[ “$PUSH_TO_HUB” == “True” ]] && echo “–push-to-hub --token ${HF_TOKEN} --repo-id ${REPO_ID}” )

UI Screenshots & Parameters

Error Logs

{‘train_runtime’: 138.1913, ‘train_samples_per_second’: 1.65, ‘train_steps_per_second’: 0.203, ‘train_loss’: 1.3923424993242537, ‘epoch’: 0.98} 100% 28/28 [02:18<00:00, 4.94s/it] 🚀 INFO | 2024-02-03 05:42:13 | main🚋477 - Finished training, saving model… 🚀 INFO | 2024-02-03 05:42:16 | main🚋488 - Merging adapter weights… 🚀 INFO | 2024-02-03 05:42:16 | autotrain.trainers.clm.utils:merge_adapter:192 - Loading adapter… Loading checkpoint shards: 100% 10/10 [00:30<00:00, 3.10s/it] /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:392: UserWarning: do_sample is set to False. However, temperature is set to 0.9 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:397: UserWarning: do_sample is set to False. However, top_p is set to 0.6 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( ⚠️ WARNING | 2024-02-03 05:42:48 | main🚋500 - Failed to merge adapter weights: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). ⚠️ WARNING | 2024-02-03 05:42:48 | main🚋501 - Skipping adapter merge. Only adapter weights will be saved. 🚀 INFO | 2024-02-03 05:42:49 | main🚋510 - Pushing model to hub… adapter_model.safetensors: 0% 0.00/1.08G [00:00<?, ?B/s] adapter_model.safetensors: 0% 0.00/1.08G [00:00<?, ?B/s]

Additional Information

When running the example colab notebook for autotrain LLM with --merge_adapter, it failed because size mismatch.

🚀 INFO | 2024-02-03 05:42:16 | main🚋488 - Merging adapter weights… 🚀 INFO | 2024-02-03 05:42:16 | autotrain.trainers.clm.utils:merge_adapter:192 - Loading adapter… Loading checkpoint shards: 100% 10/10 [00:30<00:00, 3.10s/it] /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:392: UserWarning: do_sample is set to False. However, temperature is set to 0.9 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset temperature. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( /usr/local/lib/python3.10/dist-packages/transformers/generation/configuration_utils.py:397: UserWarning: do_sample is set to False. However, top_p is set to 0.6 – this flag is only used in sample-based generation modes. You should set do_sample=True or unset top_p. This was detected when initializing the generation config instance, which means the corresponding file may hold incorrect parameterization and should be fixed. warnings.warn( ⚠️ WARNING | 2024-02-03 05:42:48 | main🚋500 - Failed to merge adapter weights: Error(s) in loading state_dict for PeftModelForCausalLM: size mismatch for base_model.model.model.embed_tokens.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). size mismatch for base_model.model.lm_head.weight: copying a param with shape torch.Size([32001, 4096]) from checkpoint, the shape in current model is torch.Size([32000, 4096]). ⚠️ WARNING | 2024-02-03 05:42:48 | main🚋501 - Skipping adapter merge. Only adapter weights will be saved.

About this issue

Original URL
State: closed
Created 5 months ago
Comments: 17 (4 by maintainers)

Most upvoted comments

this issue is being investigated. thank you for reporting it with version details. we thank you for your patience while we resolve it.

abhishekkrthakur on Feb 6, 2024