pytorch-lightning: GAN Manual optimization not working after 1.0.7

🐛 Bug

I have a GAN model and as of 1.0.7 it no longer trains correctly. I’ve tried to troubleshoot the issue to the best of my ability but I have no idea what’s causing the problem.

Please reproduce using the BoringModel

https://colab.research.google.com/gist/import-antigravity/0730243bb11b56031110fd6aa7d58971/the-boringmodel.ipynb

To Reproduce

See boringmodel above^^^

Expected behavior

Using the colab notebook switch between version 1.0.6 and 1.0.7 to see the bug. For 1.0.6 after training a few epochs it’s clear the GAN is beginning to converge, for 1.0.7 it just makes noise.

Environment

PyTorch Version (e.g., 1.0): 1.0.7
OS (e.g., Linux): macOS, Linux
How you installed PyTorch (conda, pip, source): conda
Build command you used (if compiling from source): n/a
Python version: 3.8
CUDA/cuDNN version: N/A
GPU models and configuration: N/A
Any other relevant information: N/A

About this issue

Original URL
State: closed
Created 3 years ago
Comments: 17 (7 by maintainers)

Most upvoted comments

@import-antigravity this seems to be fixed if you move the zero_grad calls to before the forward passes for the generator and discriminator. Not sure why the behaviour changed…

Oops, didn’t see this until now. Let me try

import-antigravity on Mar 1, 2021

The error happens on every version up to and including the current stable release.

On Jan 17, 2021, at 3:49 AM, chaton notifications@github.com wrote:

Hey there,

Would you mind to try out master ?

Best, T.C

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

import-antigravity on Jan 17, 2021