diffusers: train_dreambooth_lora_sdxl_advanced.py and train_dreambooth_lora_sdxl.py do not load previously saved checkpoints correctly

Describe the bug

both examples/dreambooth/train_dreambooth_lora_sdxl.py and examples/advanced_diffusion_training/train_dreambooth_lora_sdxl_advanced.py seem to have an issue when resuming training from a previously saved checkpoint.

Training and saving checkpoints seems to work correctly, however, when resuming from a previously saved checkpoint, the following messages are produced at script startup: Resuming from checkpoint checkpoint-10

12/27/2023 16:29:22 - INFO - accelerate.accelerator - Loading states from xqc/checkpoint-10
Loading unet.
12/27/2023 16:29:22 - INFO - peft.tuners.tuners_utils - Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
Loading text_encoder.
12/27/2023 16:29:23 - INFO - peft.tuners.tuners_utils - Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!

Training appears to continue normally, however, all new checkpoints saved after this will be significantly larger than the previous checkpoints:

(xl) localhost /media/nvme/xl/diffusers/examples/dreambooth # du -sch xqc/*
87M     xqc/checkpoint-10
110M    xqc/checkpoint-15
110M    xqc/checkpoint-20
110M    xqc/checkpoint-25
87M     xqc/checkpoint-5
88K     xqc/logs
494M    total

Once training with a resumed checkpoint is completed, there will be a large dump of layer names with a message saying that the model contains layers that do not match. (Full error message below)

To me, this looks like the checkpoints are being loaded incorrectly and ignored, and then a new adapter is being trained from scratch, and then both versions, old and new, are saved in the final lora.

Reproduction

To reproduce this issue, follow the following steps:

Run either train_dreambooth_lora_sdxl*.py script with appropriate parameters, including --checkpointing_steps (preferably set to a low number to reproduce this issue quickly).
After at least one or two checkpoints have been saved, either stop the script or wait for it to complete.
Rerun the same script, but also include the --resume_from_checkpoint latest or --resume_from_checkpoint checkpoint-x.
Observe the effects listed above (PEFT warning message on startup, later checkpoint file sizes)
After resumed training is completed, attempt to load the finished lora. (inference will be successful, but lora performance does not seem correct).
Observe the error message produced.

Logs

My full command-line with all arguments looks like this:

python train_dreambooth_lora_sdxl.py --pretrained_model_name_or_path ../../../models/colossus_v5.3 --instance_data_dir /media/nvme/datasets/combined/ --output_dir xqc --resolution 1024 --instance_prompt 'a photo of hxq' --train_text_encoder --num_train_epochs 1 --train_batch_size 1 --gradient_checkpointing --checkpointing_steps 5 --gradient_accumulation_steps 1 --learning_rate 0.0001 --resume_from_checkpoint latest

Error produced during inference with the affected lora (truncated because of length):

Loading adapter weights from state_dict led to unexpected keys not found in the model:  ['down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.lora_A_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.lora_B_1.default_0.weight', 'down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.lora_A_1.default_0.weight', 

*** TRUNCATED HERE ***

'mid_block.attentions.0.transformer_blocks.8.attn1.to_k.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_k.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_v.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_v.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_q.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_q.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_k.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_k.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_v.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_v.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_q.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_q.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_k.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_k.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_v.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_v.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_q.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_q.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_k.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_k.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_v.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_v.lora_B_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.lora_A_1.default_0.weight', 'mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.lora_B_1.default_0.weight'].
Loading adapter weights from None led to unexpected keys not found in the model:  ['text_model.encoder.layers.0.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.0.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.1.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.2.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.3.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.4.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.5.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.6.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.7.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.8.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.9.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.10.self_attn.out_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.k_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.k_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.v_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.v_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.q_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.q_proj.lora_B_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.out_proj.lora_A_1.default_0.weight', 'text_model.encoder.layers.11.self_attn.out_proj.lora_B_1.default_0.weight'].



### System Info

Latest diffusers - master branch pulled on 2023/12/27.

OS - Linux 6.1.9

(xl) localhost /media/nvme/xl # uname -a Linux localhost 6.1.9-noinitramfs #4 SMP PREEMPT_DYNAMIC Fri Feb 10 03:01:14 -00 2023 x86_64 Intel® Core™ i5-9500T CPU @ 2.20GHz GenuineIntel GNU/Linux


python - Python 3.10.9

(xl) localhost /media/nvme/xl # diffusers-cli env Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

diffusers version: 0.25.0.dev0
Platform: Linux-6.1.9-noinitramfs-x86_64-Intel-R-_Core-TM-i5-9500T_CPU@_2.20GHz-with-glibc2.36
Python version: 3.10.9
PyTorch version (GPU?): 2.1.2+cu121 (True)
Huggingface_hub version: 0.20.1
Transformers version: 4.36.2
Accelerate version: 0.23.0
xFormers version: 0.0.23.post1
Using GPU in script?: No (however, I believe it will occur on GPU as well)
Using distributed or parallel set-up in script?: No


### Who can help?

_No response_

About this issue

Original URL
State: closed
Created 6 months ago
Comments: 15 (11 by maintainers)

Most upvoted comments

Going to close then. But feel free to re-open.

sayakpaul on Jan 29, 2024

I support this issue, I was wondering why anyone else wasn’t having this problem so I was trying to test in a collab since I use a custom code, now I was finally able to test it and it happens there too, you don’t need any code, just do the training as stated in the documentation and then resume:

!accelerate launch train_dreambooth_lora_sdxl.py --pretrained_model_name_or_path="stabilityai/stable-diffusion-xl-base-1.0" --pretrained_vae_model_name_or_path="madebyollin/sdxl-vae-fp16-fix" --instance_data_dir="/content/dataset" --instance_prompt="anime screencap of sks girl" --validation_prompt="anime screencap of sks girl" --train_batch_size=1 --gradient_checkpointing --gradient_accumulation_steps=1 --learning_rate=1e-4 --lr_scheduler="constant" --max_train_steps=800 --validation_epochs=25 --output_dir="/content/output_lora" --train_text_encoder --checkpointing_steps=25 --optimizer="AdamW" --use_8bit_adam --resume_from_checkpoint="checkpoint-400" --mixed_precision="fp16"

I got the same errors than @prushik and also I can see it in the images, training with just one image:

test

First run first validation: imageData First run at 400 steps: Resume from 400 steps in a second run:

its just obvious that it started as a clean training even though it loads the checkpoint and the state without errors.

asomoza on Dec 28, 2023