zero123: Unable to train successfully

Hello and thank you for your very nice paper!

I am trying to train a view-conditional network using the code in zero123, but something is going wrong. I am wondering if my command is wrong, or if there is something else that I am missing.

I am using the command:

python main.py --base configs/sd-objaverse-finetune-c_concat-256.yaml --train --gpus=0,1,2,3 precision=16

I have trained for 10,000 steps and it is evident from the generations that something is going wrong. Do you know why this might be / should I be using a different command?

For context, the logged images look as follows:

inputs_gs-000000_e-000000_b-000000: inputs_gs-000000_e-000000_b-000000

conditioning_gs-000000_e-000000_b-000000: conditioning_gs-000000_e-000000_b-000000

reconstruction_gs-000000_e-000000_b-000000 reconstruction_gs-000000_e-000000_b-000000

samples_gs-000000_e-000000_b-000000 samples_gs-000000_e-000000_b-000000

samples_cfg_scale_3 00_gs-000000_e-000000_b-000000 samples_cfg_scale_3 00_gs-000000_e-000000_b-000000

Thank you so much for your help!

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 15 (5 by maintainers)

Most upvoted comments

Thanks! I will try it out and let you know how it works.

Hi all, sorry for the delay. I’ve updated the readme file for the training script. Could you please try the commands here and let me know if it works?

Hello and thanks for the quick response!

Yes, the loss is diverging (as evident from the images). I do not think it is a batch size issue, as I also tried with gradient accumulation and with 8 GPUs. It must be the initialization.

For initialization, I’m slightly confused – are you saying that you used lambdalabs/stable-diffusion-image-conditioned or a different set of weights based on SDv2? Did you convert these yourself to the format required by the ldm code?

Also, what you mean when you say that you will release the training script after the dataset – I thought the training script was already released (in zero123/)?

Apologies for the confusion and thanks so much for the help!

Hi @greeneggsandyaml , we initialized our model weights with the image-conditioned stable diffusion released by lambda-labs. Can you share the loss curve of your training as well? It looks to me like the loss has diverged due to instability. My guess you are using smaller batch size (4 vs 8 gpus), and randomly initialized stable diffusion, causing the training instability. I couldn’t find the version of the model weights that I used online for initialization (it’s the one from version 2). We are working on releasing the dataset as well so we will release the training script after testing those. In the mean time, feel free to experiment with training randomly initialized SD with different batch size and learning rates.