pytorch-lightning: Mixed precision: scheduler and optimizer are called in the wrong order

🐛 Bug

When using mixed-precision training, scheduler and optimizer are called in the wrong order. Warning is generated:

UserWarning: Detected call of `lr_scheduler.step()` before `optimizer.step()`. In PyTorch 1.1.0 and later, you should call them in the opposite order: `optimizer.step()` before `lr_scheduler.step()`.

Please reproduce using the BoringModel

https://colab.research.google.com/drive/1G7pk6E9XUYq-pS41DXKhqM9Srx8sikiP?usp=sharing

There are four tests. Three of them doesn’t raise the warning:

  1. test_amp_scheduler(precision=16, configure_optimizers=configure_optimizers_1)
  2. test_amp_scheduler(precision=32, configure_optimizers=configure_optimizers_1)
  3. test_amp_scheduler(precision=32, configure_optimizers=configure_optimizers_2)

This testcase raises the warning:

  1. test_amp_scheduler(precision=16, configure_optimizers=configure_optimizers_2)

To Reproduce

  1. Create model with configure_optimizers in a following dictionary style:
def configure_optimizers_2(model):
    optimizer = torch.optim.SGD(model.layer.parameters(), lr=0.1)
    scheduler = {'scheduler':  torch.optim.lr_scheduler.StepLR(optimizer, step_size=1),
              'name': 'learning_rate',
              'interval':'step',
              'frequency': 1}
    
    return {"optimizer": optimizer, "lr_scheduler": scheduler}
  1. Enable mixed-precision training by setting precision=16 in a Trainer
  2. Start training

Note

When scheduler is defined in another way, the issue seems to not occur:

def configure_optimizers_1(model):
    optimizer = torch.optim.SGD(model.layer.parameters(), lr=0.1)
    scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=1)
    
    return {"optimizer": optimizer, "lr_scheduler": scheduler}

Expected behavior

No warning

Environment

  • CUDA:
    • GPU:
      • Tesla P100-PCIE-16GB
    • available: True
    • version: 10.1
  • Packages:
    • numpy: 1.19.5
    • pyTorch_debug: True
    • pyTorch_version: 1.7.0+cu101
    • pytorch-lightning: 1.1.4
    • tqdm: 4.41.1
  • System:
    • OS: Linux
    • architecture:
      • 64bit
    • processor: x86_64
    • python: 3.6.9
    • version: #1 SMP Thu Jul 23 08:00:38 PDT 2020

cc @tchaton @rohitgr7 @carmocca @justusschock @awaelchli @akihironitta

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 14
  • Comments: 35 (11 by maintainers)

Commits related to this issue

Most upvoted comments

Following and waiting.

Hi @BttMA @aleSuglia The fix is still wip in #9923.


This issue only happens when

  • Trainer(precision=16) AND
  • lr_scheduler.step() runs every small steps (not epochs), i.e.
    def configure_optimizers(self):
        optimizer = ...
        scheduler = {
            "scheduler": ...,
            "interval": "step",
            "frequency": 1,  # other small numbers may also cause this issue.
        }
        return {"optimizer": optimizer, "lr_scheduler": scheduler}
    

What’s happening is that scaler.step(optimizer) (getting called when using native amp) is likely to skip optimizer.step() for the first few iterations, and thus, it makes lr_scheduler.step() called before any call of optimizer.step().

For side note, you’ll get the same behaviour in pure PyTorch, too, as reported in “optimizer.step() before lr_scheduler.step() error using GradScaler”.

My 2 cents: Users should never get a warning when they aren’t doing anything wrong and/or there is no way for them to do something correctly. Specifically, unless this bug is fixed there is no way to run CyclicLR or OneCycleLR correctly without getting this warning.

I’m using PL pytorch-lightning==1.6.4 but still same issue

any update?

pytorch==2.1.0

pytorch-lightning==2.1.0

I think we should implement https://github.com/pytorch/pytorch/issues/67590 (PyTorch). Any additions in Lightning would always be workarounds.

i think this issue is with pytorch instead of pytorch lightning.

Same issue.

Same issue here. I saw a workaround in the implementation of sentence-transformers or SBERT.

[...]
scale_before_step = scaler.get_scale()
scaler.scale(loss_value).backward()
scaler.unscale_(optimizer)
torch.nn.utils.clip_grad_norm_(self.model.parameters(), max_grad_norm)
scaler.step(optimizer)
scaler.update()

skip_scheduler = scaler.get_scale() != scale_before_step

[...]

if not skip_scheduler:
            scheduler.step()

@BttMA I’m sorry for your inconvenience. I’m not sure if there’s a workaround for this issue at the moment… I’ll try to have this issue resolved asap within this week and keep you updated.

Any updates on this?

Same issue with pytorch-lightning==1.4.1.

@javierlorenzod Thanks a lot for your report! Let me look into it.