pytorch-lightning: Mixed precision: scheduler and optimizer are called in the wrong order
🐛 Bug
When using mixed-precision training, scheduler and optimizer are called in the wrong order. Warning is generated:
UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.
Please reproduce using the BoringModel
https://colab.research.google.com/drive/1G7pk6E9XUYq-pS41DXKhqM9Srx8sikiP?usp=sharing
There are four tests. Three of them doesn’t raise the warning:
- test_amp_scheduler(precision=16, configure_optimizers=configure_optimizers_1)
- test_amp_scheduler(precision=32, configure_optimizers=configure_optimizers_1)
- test_amp_scheduler(precision=32, configure_optimizers=configure_optimizers_2)
This testcase raises the warning:
- test_amp_scheduler(precision=16, configure_optimizers=configure_optimizers_2)
To Reproduce
- Create model with
configure_optimizersin a following dictionary style:
def configure_optimizers_2(model):
optimizer = torch.optim.SGD(model.layer.parameters(), lr=0.1)
scheduler = {'scheduler': torch.optim.lr_scheduler.StepLR(optimizer, step_size=1),
'name': 'learning_rate',
'interval':'step',
'frequency': 1}
return {"optimizer": optimizer, "lr_scheduler": scheduler}
- Enable mixed-precision training by setting
precision=16in aTrainer - Start training
Note
When scheduler is defined in another way, the issue seems to not occur:
def configure_optimizers_1(model):
optimizer = torch.optim.SGD(model.layer.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
return {"optimizer": optimizer, "lr_scheduler": scheduler}
Expected behavior
No warning
Environment
- CUDA:
- GPU:
- Tesla P100-PCIE-16GB
- available: True
- version: 10.1
- GPU:
- Packages:
- numpy: 1.19.5
- pyTorch_debug: True
- pyTorch_version: 1.7.0+cu101
- pytorch-lightning: 1.1.4
- tqdm: 4.41.1
- System:
- OS: Linux
- architecture:
- 64bit
- processor: x86_64
- python: 3.6.9
- version: #1 SMP Thu Jul 23 08:00:38 PDT 2020
cc @tchaton @rohitgr7 @carmocca @justusschock @awaelchli @akihironitta
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 14
- Comments: 35 (11 by maintainers)
Commits related to this issue
- Bugfix: Mixed precision: scheduler and optimizer are called in the wrong order (https://github.com/PyTorchLightning/pytorch-lightning/issues/5558) — committed to javierlorenzod/pytorch-lightning by javierlorenzod 3 years ago
- Bugfix: Mixed precision: scheduler and optimizer are called in the wrong order (https://github.com/PyTorchLightning/pytorch-lightning/issues/5558) — committed to javierlorenzod/pytorch-lightning by javierlorenzod 3 years ago
Following and waiting.
Hi @BttMA @aleSuglia The fix is still wip in #9923.
This issue only happens when
Trainer(precision=16)ANDlr_scheduler.step()runs every small steps (not epochs), i.e.What’s happening is that
scaler.step(optimizer)(getting called when using native amp) is likely to skipoptimizer.step()for the first few iterations, and thus, it makeslr_scheduler.step()called before any call ofoptimizer.step().For side note, you’ll get the same behaviour in pure PyTorch, too, as reported in “
optimizer.step()beforelr_scheduler.step()error using GradScaler”.My 2 cents: Users should never get a warning when they aren’t doing anything wrong and/or there is no way for them to do something correctly. Specifically, unless this bug is fixed there is no way to run
CyclicLRorOneCycleLRcorrectly without getting this warning.I’m using PL pytorch-lightning==1.6.4 but still same issue
any update?
pytorch==2.1.0pytorch-lightning==2.1.0I think we should implement https://github.com/pytorch/pytorch/issues/67590 (PyTorch). Any additions in Lightning would always be workarounds.
i think this issue is with pytorch instead of pytorch lightning.
Same issue.
Same issue here. I saw a workaround in the implementation of sentence-transformers or SBERT.
@BttMA I’m sorry for your inconvenience. I’m not sure if there’s a workaround for this issue at the moment… I’ll try to have this issue resolved asap within this week and keep you updated.
Any updates on this?
Same issue with
pytorch-lightning==1.4.1.@javierlorenzod Thanks a lot for your report! Let me look into it.