diffusers: DreamBooth Diffusers tutorial failed

Describe the bug

Issue with the DreamBooth Diffusers tutorial. After executing the accelerate launch command, the script breaks with the following runtimeError: RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead.

I’ve not been able to locate the source of the issue, and I’ve not seen any other issues reported on this particular one, therefore I’d appreciate if anyone would give me some guidance on what steps could I do to debug this.

Here’s the initial screenshot where the script successfully downloaded necessary files such as the safetensors.

I’m going to attach a log of the result once I try to execute the script again (without traces of the downloads for clearer visualization of the warnings/errors).

The command I’m trying to execute is the following:

accelerate launch train_dreambooth.py --pretrained_model_name_or_path=“CompVis/stable-diffusion-v1-4” --instance_data_dir=“./dog” --output_dir=“./test” --instance_prompt=“a photo of sks dog” --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler=“constant” --lr_warmup_steps=0 --max_train_steps=400

Reproduction

To reproduce this error, all you need is to follow the steps here:

https://huggingface.co/docs/diffusers/training/dreambooth

Logs

D:\Anaconda\lib\site-packages\accelerate\accelerator.py:258: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
05/27/2023 17:07:31 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'variance_type', 'dynamic_thresholding_ratio', 'prediction_type', 'sample_max_value', 'clip_sample_range', 'thresholding'} was not found in config. Values will be initialized to default values.
{'addition_embed_type_num_heads', 'class_embeddings_concat', 'resnet_skip_time_act', 'encoder_hid_dim', 'time_cond_proj_dim', 'time_embedding_dim', 'encoder_hid_dim_type', 'upcast_attention', 'only_cross_attention', 'dual_cross_attention', 'use_linear_projection', 'class_embed_type', 'projection_class_embeddings_input_dim', 'resnet_out_scale_factor', 'mid_block_only_cross_attention', 'cross_attention_norm', 'conv_in_kernel', 'addition_embed_type', 'timestep_post_act', 'conv_out_kernel', 'time_embedding_type', 'time_embedding_act_fn', 'mid_block_type', 'num_class_embeds', 'resnet_time_scale_shift'} was not found in config. Values will be initialized to default values.
05/27/2023 17:07:37 - INFO - __main__ - ***** Running training *****
05/27/2023 17:07:37 - INFO - __main__ -   Num examples = 5
05/27/2023 17:07:37 - INFO - __main__ -   Num batches each epoch = 5
05/27/2023 17:07:37 - INFO - __main__ -   Num Epochs = 80
05/27/2023 17:07:37 - INFO - __main__ -   Instantaneous batch size per device = 1
05/27/2023 17:07:37 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
05/27/2023 17:07:37 - INFO - __main__ -   Gradient Accumulation steps = 1
05/27/2023 17:07:37 - INFO - __main__ -   Total optimization steps = 400
Steps:   0%|                                                                                                                                                                                                | 0/400 [00:00<?, ?it/s]Traceback (most recent call last):
  File "D:\stableDiffusionFolder\modelTraining\diffusers\examples\dreambooth\train_dreambooth.py", line 1323, in <module>
    main(args)
  File "D:\stableDiffusionFolder\modelTraining\diffusers\examples\dreambooth\train_dreambooth.py", line 1194, in main
    model_pred = unet(noisy_model_input, timesteps, encoder_hidden_states).sample
  File "D:\Anaconda\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Anaconda\lib\site-packages\diffusers\models\unet_2d_condition.py", line 807, in forward
    sample = self.conv_in(sample)
  File "D:\Anaconda\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\Anaconda\lib\site-packages\torch\nn\modules\conv.py", line 463, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "D:\Anaconda\lib\site-packages\torch\nn\modules\conv.py", line 459, in _conv_forward
    return F.conv2d(input, weight, bias, self.stride,
RuntimeError: Given groups=1, weight of size [320, 4, 3, 3], expected input[1, 3, 512, 512] to have 4 channels, but got 3 channels instead
Steps:   0%|                                                                                                                                                                                                | 0/400 [00:01<?, ?it/s] 
Traceback (most recent call last):
  File "D:\Anaconda\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Anaconda\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\Anaconda\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "D:\Anaconda\lib\site-packages\accelerate\commands\accelerate_cli.py", line 45, in main
    args.func(args)
  File "D:\Anaconda\lib\site-packages\accelerate\commands\launch.py", line 918, in launch_command
    simple_launcher(args)
  File "D:\Anaconda\lib\site-packages\accelerate\commands\launch.py", line 580, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\Anaconda\\python.exe', 'train_dreambooth.py', '--pretrained_model_name_or_path=CompVis/stable-diffusion-v1-4', '--instance_data_dir=./dog', '--output_dir=./test', '--instance_prompt=a photo of sks dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--learning_rate=5e-6', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=400']' returned non-zero exit status 
1.

System Info

Windows, Python 3.9.13

All of the libraries were installed on source, from this repository, via the build.py and the requirements.txt, but if needed I can provide the versions of the libraries you think can be causing this issue.

About this issue

Original URL
State: closed
Created a year ago
Reactions: 2
Comments: 19 (9 by maintainers)

Most upvoted comments

@sayakpaul @CamooCodee I was successful at running the training, but with the lora variant, on the windows setup.

accelerate launch train_dreambooth_lora.py --pretrained_model_name_or_path="CompVis/stable-diffusion-v1-4" --instance_data_dir="./dog" --output_dir="./test" --instance_prompt="a photo of sks dog" --resolution=256 --train_batch_size=1 --gradient_accumulation_steps=1 --learning_rate=5e-6 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=400
D:\Anaconda\lib\site-packages\accelerate\accelerator.py:258: FutureWarning: `logging_dir` is deprecated and will be removed in version 0.18.0 of 🤗 Accelerate. Use `project_dir` instead.
  warnings.warn(
06/05/2023 19:32:06 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cpu

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'thresholding', 'variance_type', 'prediction_type', 'sample_max_value', 'clip_sample_range', 'dynamic_thresholding_ratio'} was not found in config. Values will be initialized to default values.
Downloading (…)main/vae/config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 551/551 [00:00<00:00, 276kB/s]
D:\Anaconda\lib\site-packages\huggingface_hub\file_download.py:133: UserWarning: `huggingface_hub` cache-system uses symlinks by default to efficiently store duplicated files but your machine does 
not support them in C:\Users\kiria\.cache\huggingface\hub. Caching files will still work but in a degraded version that might require more space on your disk. This warning can be disabled by setting the `HF_HUB_DISABLE_SYMLINKS_WARNING` environment variable. For more details, see https://huggingface.co/docs/huggingface_hub/how-to-cache#limitations.
To support symlinks on Windows, you either need to activate Developer Mode or to run Python as an administrator. In order to see activate developer mode, see this article: https://docs.microsoft.com/en-us/windows/apps/get-started/enable-your-device-for-development
  warnings.warn(message)
Downloading (…)ch_model.safetensors: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 335M/335M [00:03<00:00, 98.6MB/s]
{'norm_num_groups'} was not found in config. Values will be initialized to default values.
{'only_cross_attention', 'resnet_time_scale_shift', 'projection_class_embeddings_input_dim', 'addition_embed_type_num_heads', 'cross_attention_norm', 'timestep_post_act', 'class_embeddings_concat', 'conv_in_kernel', 'class_embed_type', 'time_embedding_act_fn', 'time_embedding_type', 'resnet_out_scale_factor', 'num_class_embeds', 'addition_embed_type', 'mid_block_type', 'encoder_hid_dim_type', 'use_linear_projection', 'conv_out_kernel', 'encoder_hid_dim', 'dual_cross_attention', 'time_embedding_dim', 'time_cond_proj_dim', 'mid_block_only_cross_attention', 'resnet_skip_time_act', 'upcast_attention'} was not found in config. Values will be initialized to default values.
06/05/2023 19:32:13 - INFO - __main__ - ***** Running training *****
06/05/2023 19:32:13 - INFO - __main__ -   Num examples = 5
06/05/2023 19:32:13 - INFO - __main__ -   Num batches each epoch = 5
06/05/2023 19:32:13 - INFO - __main__ -   Num Epochs = 80
06/05/2023 19:32:13 - INFO - __main__ -   Instantaneous batch size per device = 1
06/05/2023 19:32:13 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
06/05/2023 19:32:13 - INFO - __main__ -   Gradient Accumulation steps = 1
06/05/2023 19:32:13 - INFO - __main__ -   Total optimization steps = 400
Steps: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 400/400 [58:22<00:00,  8.89s/it, loss=0.124, lr=5e-6]Model weights saved in ./test\pytorch_lora_weights.bin
Downloading (…)ain/model_index.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 541/541 [00:00<00:00, 271kB/s]
safety_checker\model.safetensors not found                                                                                                                                | 0.00/541 [00:00<?, ?B/s]
Downloading (…)_checker/config.json: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.56k/4.56k [00:00<00:00, 2.28MB/s]
Downloading (…)rocessor_config.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 342/342 [00:00<00:00, 172kB/s] 
Downloading (…)nfig-checkpoint.json: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 209/209 [00:00<00:00, 105kB/s] 
Downloading (…)on_pytorch_model.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 335M/335M [00:05<00:00, 59.9MB/s] 
Downloading pytorch_model.bin: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1.22G/1.22G [00:13<00:00, 87.1MB/s] 
Fetching 16 files: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 16/16 [00:14<00:00,  1.12it/s] 
{'requires_safety_checker'} was not found in config. Values will be initialized to default values.                                                                   | 4/16 [00:14<00:45,  3.83s/it] 
{'only_cross_attention', 'resnet_time_scale_shift', 'projection_class_embeddings_input_dim', 'addition_embed_type_num_heads', 'cross_attention_norm', 'timestep_post_act', 'class_embeddings_concat', 'conv_in_kernel', 'class_embed_type', 'time_embedding_act_fn', 'time_embedding_type', 'resnet_out_scale_factor', 'num_class_embeds', 'addition_embed_type', 'mid_block_type', 'encoder_hid_dim_type', 'use_linear_projection', 'conv_out_kernel', 'encoder_hid_dim', 'dual_cross_attention', 'time_embedding_dim', 'time_cond_proj_dim', 'mid_block_only_cross_attention', 'resnet_skip_time_act', 'upcast_attention'} was not found in config. Values will be initialized to default values.
{'norm_num_groups'} was not found in config. Values will be initialized to default values.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
{'prediction_type'} was not found in config. Values will be initialized to default values.
{'lambda_min_clipped', 'solver_order', 'solver_type', 'thresholding', 'variance_type', 'sample_max_value', 'algorithm_type', 'lower_order_final', 'use_karras_sigmas', 'dynamic_thresholding_ratio'} 
was not found in config. Values will be initialized to default values.
Loading unet.
Loading text_encoder.
Steps: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 400/400 [58:39<00:00,  8.80s/it, loss=0.124, lr=5e-6]

I changed the pytorch version to 1.13.1 aswell, I haven’t tried with the 2.1 because I’m too scared I’m going to break it all again. I also reduced the resolution to 256. Here’s the versions:

diffusers version: 0.17.0.dev0
Platform: Windows-10-10.0.19045-SP0
Python version: 3.9.13
PyTorch version (GPU?): 1.13.1+cpu (False)
Huggingface_hub version: 0.14.1
Transformers version: 4.29.2
Accelerate version: 0.19.0
xFormers version: not installed
Using GPU in script?: <fill in>
Using distributed or parallel set-up in script?: <fill in>

kiriamcf on Jun 5, 2023