fast-stable-diffusion: Training on a custom (huggingface) model is broken
I tried several different base models based on 1.5. Pasted the following in Path_to_HuggingFace, no path or link. 1.5 selected as custom model version:
- darkstorm2150/Protogen_v5.3_Official_Release
- 22h/vintedois-diffusion-v0-1
- dreamlike-art/dreamlike-photoreal-2.0
- devilkkw/KKW_FANTAREAL_V1.0
All of them crash when it gets to training the unet, I get:
Training the UNet...
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 852, in <module>
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 522, in main
vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae")
File "/usr/local/lib/python3.8/dist-packages/diffusers/modeling_utils.py", line 388, in from_pretrained
raise EnvironmentError(
OSError: Error no file named diffusion_pytorch_model.bin found in directory /content/stable-diffusion-custom.
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--stop_text_encoder_training=300', '--image_captions_filename', '--train_only_unet', '--save_starting_step=1000', '--save_n_steps=1000', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jmilei-photoreal', '--pretrained_model_name_or_path=/content/stable-diffusion-custom', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jmilei-photoreal/instance_images', '--output_dir=/content/models/jmilei-photoreal', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jmilei-photoreal/captions', '--instance_prompt=', '--seed=643601', '--resolution=768', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=3e-06', '--lr_scheduler=polynomial', '--lr_warmup_steps=0', '--max_train_steps=4999']' returned non-zero exit status 1.
I tried to patch it by copying the stuff from /unet/ to the parent as it expected. Still then got this other error and rage-quitted
Training the UNet...
Traceback (most recent call last):
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 852, in <module>
main()
File "/content/diffusers/examples/dreambooth/train_dreambooth.py", line 522, in main
vae = AutoencoderKL.from_pretrained(args.pretrained_model_name_or_path, subfolder="vae")
File "/usr/local/lib/python3.8/dist-packages/diffusers/modeling_utils.py", line 451, in from_pretrained
model, unused_kwargs = cls.from_config(
File "/usr/local/lib/python3.8/dist-packages/diffusers/configuration_utils.py", line 202, in from_config
model = cls(**init_dict)
File "/usr/local/lib/python3.8/dist-packages/diffusers/configuration_utils.py", line 516, in inner_init
init(self, *args, **init_kwargs)
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/vae.py", line 544, in __init__
self.encoder = Encoder(
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/vae.py", line 94, in __init__
down_block = get_down_block(
File "/usr/local/lib/python3.8/dist-packages/diffusers/models/unet_2d_blocks.py", line 67, in get_down_block
raise ValueError("cross_attention_dim must be specified for CrossAttnDownBlock2D")
ValueError: cross_attention_dim must be specified for CrossAttnDownBlock2D
Traceback (most recent call last):
File "/usr/local/bin/accelerate", line 8, in <module>
sys.exit(main())
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/accelerate_cli.py", line 43, in main
args.func(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 837, in launch_command
simple_launcher(args)
File "/usr/local/lib/python3.8/dist-packages/accelerate/commands/launch.py", line 354, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['/usr/bin/python3', '/content/diffusers/examples/dreambooth/train_dreambooth.py', '--stop_text_encoder_training=300', '--image_captions_filename', '--train_only_unet', '--save_starting_step=1000', '--save_n_steps=1000', '--Session_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jmilei-v3-protogen5.3', '--pretrained_model_name_or_path=/content/stable-diffusion-custom', '--instance_data_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jmilei-v3-protogen5.3/instance_images', '--output_dir=/content/models/jmilei-v3-protogen5.3', '--captions_dir=/content/gdrive/MyDrive/Fast-Dreambooth/Sessions/jmilei-v3-protogen5.3/captions', '--instance_prompt=', '--seed=425318', '--resolution=512', '--mixed_precision=fp16', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--gradient_checkpointing', '--use_8bit_adam', '--learning_rate=4e-06', '--lr_scheduler=polynomial', '--lr_warmup_steps=0', '--max_train_steps=4999']' returned non-zero exit status 1.
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 40 (11 by maintainers)
the fp32 unet is 3.2 GB not 1.6, the majority of repos have the diffusers model in fp32, dreamlike is an exception, I don’t know why.
as for the vae, I will change that soon for custom models since most of them now include the improved vae
I noticed that civitai merged models don’t work, but huggingface version works
https://colab.research.google.com/drive/1aQ5nXTfLWHhZi7GOfXteLKg5OT1X-aBZ
don’t use civitai, use huggingface https://huggingface.co/darkstorm2150, the huggingface models work fine
BTW, I see the notebook is replacing the VAE with 1.5’s for custom models, don’t they sometimes have their own custom VAE (and rarely none) and actually count on it not to change?