StyleAvatar: StyleAvatar takes too long to train

Thanks for the great repo.

I’m training the Full StyleAvatar, specifically with the command python train.py --batch 3 path-to-dataset. Training from scratch as the checkpoints have not been shared yet.

On the A10 GPU, it takes about a week to run for the default training parameters. Is that normal? I ask because the paper mentioned

The proposed network can converge within two hours while ensuring high image quality and a forward rendering time of only 20 milliseconds.

So maybe I’m missing something, can you help? 😃

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 29 (13 by maintainers)

Most upvoted comments

@LizhenWangT Could you please elaborate a bit on the cross-person re-anactment. I did my driving video of input talker with faceverse so I have the renders and the uv masks. I also trained a model for 20k iterations on my target person. But I’m getting pretty bad results. Could you help me on my way with this: Details: As shown in Line 129-132 of FaceVerse/faceversev3_jittor/tracking_offline_cuda.py, you need to change the first 150 dims (shape parameters of another person) of the jittor tensor coeffs in Line 123 to the values (shape parameters of the source actor) in id.txt generated by FaceVerse/faceversev3_jittor/tracking_offline_cuda.py. If you are testing cross person re-enactment cases and only if the source actor's expression in the first frame is neutral expression, as shown in Line 133-134, add the first expression in exp.txt to the exp param (id and exp are coupling with each other, so this operation can improve the cross-person results). It’s un clear to me what you mean by that

If you have update the code in FaceVerse. Just add --first_frame_is_neutral and input --id_folder (the driving video also needs to be in a neutral expression in its first frame).

That’s all? Because this part got me confused: you need to change the first 150 dims (shape parameters of another person) of the jittor tensor coeffs in Line 123 to the values (shape parameters of the source actor) in id.txt generated by FaceVerse/faceversev3_jittor/tracking_offline_cuda.py.

Yes, I also think its quite hard to understand, so I have updated this part into the faceverse code. But I’m too lazy, so I didn’t change the readme here, I hope users can read the code and find this part by themselves lol.

Actually, the training of styleunet is more stable than styleavatar. Sometimes the training of styleavatar may fail when the discriminator loss is always lower than 0.1 (d: 0.0075; g: 5.9484;), which means the discriminator is almost useless. Usually try to train the dataset for a second time from the first checkpoint may solve this problem (different random numbers of latent, noise, etc.).

You mean just re-run it? or train styleUnet model then train styleAvatar from the new styleUnet checkpoint

Yes, just re-run it from the checkpoint when you see the discriminator loss is always lower than 0.1.

Thank you for the message. This bug has been fixed now.

Fixed now. It was mainly caused by the flickering tracking of FaceVerse. Smooth term in preprocessing has been updated. The pretrained model uploaded can perform like this:

https://github.com/LizhenWangT/StyleAvatar/assets/26791093/a4ac5f63-6b7b-47da-baca-97092a0d2025

Just stop when the generated image meets your requirements (several hours). Don’t care about the default steps.