wandb: watch() sometimes does not log the gradients
Describe the bug I am training a generative adversarial network. Without adversarial training, the gradients of the generator is logged. With adversarial training, the gradients of the generator is not logged.
To Reproduce Steps to reproduce the behavior:
- Go to https://github.com/AliaksandrSiarohin/first-order-model
- Add the following at line 71 of
run.py
wandb.watch(generator, log='all')
wandb.watch(discriminator, log='all')
wandb.watch(kp_detector, log='all')
- Run
python run.py --config config/vox-256.yamland we can see thegeneratorandkp_detectorgradients are logged. - Run
python run.py --config config/vox-adv-256.yaml, which is adversarial training, and we can see that onlydiscriminatorgradients are logged, not those ofgeneratorandkp_detector.
Expected behavior The gradients of all the three network should be logged for adversarial training.
Screenshots
Operating System
- OS: Linux
Additional context
conda list output: https://pastebin.com/sV920TQP
About this issue
- Original URL
- State: open
- Created 4 years ago
- Reactions: 1
- Comments: 16 (6 by maintainers)
It’s likely regular RAM that’s the issue.
It’s likely getting killed due to memory preasure. We have to load the gradients from the GPU and if your model is really large your notebook may not have enough memory. Are you able to get a larger instance for your notebook?
Hi Steven, can you link us to any runs where all media / metrics failed to log? If you don’t want to post the link here you can reach me directly at aswanberg@wandb.com
For some reason, when I tried lowering the
log_freq =10orlog_freq=1it only shows one step and not the entire distribution in my gradient charts. The gradient charts used to work.Please advise @cvphelps
Hi @kenmbkr thanks for writing in and including a reproducible script, that’s awesome! We are on Christmas holidays right now but will get back to this after vacation. One thing I’ve seen in the past is that there aren’t enough iterations on certain models, and since the gradients aren’t logged every step, the interval might be too large to pick up the other pieces of training.