wandb: watch() sometimes does not log the gradients

Describe the bug I am training a generative adversarial network. Without adversarial training, the gradients of the generator is logged. With adversarial training, the gradients of the generator is not logged.

To Reproduce Steps to reproduce the behavior:

  1. Go to https://github.com/AliaksandrSiarohin/first-order-model
  2. Add the following at line 71 of run.py
wandb.watch(generator, log='all')
wandb.watch(discriminator, log='all')
wandb.watch(kp_detector, log='all')
  1. Run python run.py --config config/vox-256.yaml and we can see the generator and kp_detector gradients are logged.
  2. Run python run.py --config config/vox-adv-256.yaml, which is adversarial training, and we can see that only discriminator gradients are logged, not those of generator and kp_detector.

Expected behavior The gradients of all the three network should be logged for adversarial training.

Screenshots

Operating System

  • OS: Linux

Additional context conda list output: https://pastebin.com/sV920TQP

About this issue

  • Original URL
  • State: open
  • Created 4 years ago
  • Reactions: 1
  • Comments: 16 (6 by maintainers)

Most upvoted comments

It’s likely regular RAM that’s the issue.

It’s likely getting killed due to memory preasure. We have to load the gradients from the GPU and if your model is really large your notebook may not have enough memory. Are you able to get a larger instance for your notebook?

Hi Steven, can you link us to any runs where all media / metrics failed to log? If you don’t want to post the link here you can reach me directly at aswanberg@wandb.com

For some reason, when I tried lowering the log_freq =10 or log_freq=1 it only shows one step and not the entire distribution in my gradient charts. The gradient charts used to work. image

Please advise @cvphelps

Hi @kenmbkr thanks for writing in and including a reproducible script, that’s awesome! We are on Christmas holidays right now but will get back to this after vacation. One thing I’ve seen in the past is that there aren’t enough iterations on certain models, and since the gradients aren’t logged every step, the interval might be too large to pick up the other pieces of training.