pytorch-lightning: GAN Manual optimization not working after 1.0.7

🐛 Bug

I have a GAN model and as of 1.0.7 it no longer trains correctly. I’ve tried to troubleshoot the issue to the best of my ability but I have no idea what’s causing the problem.

Please reproduce using the BoringModel

https://colab.research.google.com/gist/import-antigravity/0730243bb11b56031110fd6aa7d58971/the-boringmodel.ipynb

To Reproduce

See boringmodel above^^^

Expected behavior

Using the colab notebook switch between version 1.0.6 and 1.0.7 to see the bug. For 1.0.6 after training a few epochs it’s clear the GAN is beginning to converge, for 1.0.7 it just makes noise.

Environment

  • PyTorch Version (e.g., 1.0): 1.0.7
  • OS (e.g., Linux): macOS, Linux
  • How you installed PyTorch (conda, pip, source): conda
  • Build command you used (if compiling from source): n/a
  • Python version: 3.8
  • CUDA/cuDNN version: N/A
  • GPU models and configuration: N/A
  • Any other relevant information: N/A

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 17 (7 by maintainers)

Most upvoted comments

@import-antigravity this seems to be fixed if you move the zero_grad calls to before the forward passes for the generator and discriminator. Not sure why the behaviour changed…

Oops, didn’t see this until now. Let me try

The error happens on every version up to and including the current stable release.

On Jan 17, 2021, at 3:49 AM, chaton notifications@github.com wrote:

Hey there,

Would you mind to try out master ?

Best, T.C

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.