pytorch-lightning: `lr_finder` fails when called after training for 1 or more epochs
🐛 Bug
Calling lr_finder on the model after trainer.fit() has been called will fail with:
LR finder stopped early due to diverging loss.
Failed to compute suggesting for `lr`. There might not be enough points.
, even when the default value of min_lr=1e-08 has been changed to 1e-30.
Please reproduce using the BoringModel and post here
- Reproduced using a callback: https://colab.research.google.com/drive/1sbOPs8edyFi_idJNnd6gr3etyv7V57YU?usp=sharing
- Reproduced with calling Trainer twice: https://colab.research.google.com/drive/1WxUvayBBg_163nu8fjv-jsvPk6pUrSrK?usp=sharing
To Reproduce
Add the following callback (as demonstrated with the BoringModel):
# Call Learning Rate finder after X epochs
class LRFinderXEpoch(Callback):
def __init__(self, epoch=1):
super().__init__()
self.epoch = epoch
def on_train_epoch_start(self, trainer, pl_module):
if trainer.current_epoch == self.epoch:
print("Calling learning rate finder!")
trainer.tune(pl_module)
# trainer.tuner.lr_find(pl_module, min_lr=1e-30)
Expected behavior
Find the best learning rate after a few epochs of training (e.g. when doing Transfer Learning).
Environment
* CUDA:
- GPU:
- Tesla T4
- available: True
- version: 10.1
* Packages:
- numpy: 1.18.5
- pyTorch_debug: True
- pyTorch_version: 1.7.0+cu101
- pytorch-lightning: 1.0.8
- tqdm: 4.41.1
* System:
- OS: Linux
- architecture:
- 64bit
-
- processor: x86_64
- python: 3.6.9
- version: #1 SMP Thu Jul 23 08:00:38 PDT 2020
Additional context
Issue came from the following discussion: https://forums.pytorchlightning.ai/t/train-2-epochs-head-unfreeze-learning-rate-finder-continue-training-fit-one-cycle/366/4
Potentially related issues:
About this issue
- Original URL
- State: open
- Created 4 years ago
- Comments: 15 (6 by maintainers)
For my project, I found that setting
early_stop_threshold=Nonefixes this. Thanks to @NumesSanguis for the suggestion.The early stopping threshold of the Learning Rate finder seems to be too small. I regularly observe https://github.com/PyTorchLightning/pytorch-lightning/blob/f2fa3c82567ade0aa5bea3aa298542fc5040e4b7/pytorch_lightning/tuner/lr_finder.py#L419 to trigger on a standard image classification task after the first batch.