pytorch-lightning: ModelCheckpoint is not saving top k models
🐛 Bug
ModelCheckpoint is not correctly monitoring metric values.
To Reproduce
https://colab.research.google.com/drive/1onBmED7dngP_VwFxcFBMsnQi82KbizSk?usp=sharing
Expected behavior
ModelCheckpoint should save top k models based on x metric, but it currently displays Epoch XXX, step XXX: x was not in top 2 for every epoch.
Environment
- CUDA:
- GPU:
- Tesla T4
- available: True
- version: 10.1
- GPU:
- Packages:
- numpy: 1.19.5
- pyTorch_debug: True
- pyTorch_version: 1.7.0+cu101
- pytorch-lightning: 1.2.0
- tqdm: 4.41.1
- System:
- OS: Linux
- architecture:
- 64bit
- processor: x86_64
- python: 3.6.9
- version: #1 SMP Thu Jul 23 08:00:38 PDT 2020
Additional context
The documentation doesn’t mention how one should set the metric to be used in ModelCheckpoint. Tried to use both x or loss value, but ModelCheckpoint shows the same message for both cases. Also, the message should be more clear, saying that ModelCheckpoint couldn’t find chosen value to monitor instead of saying that it was not in top k, since it displays the same message if I choose to monitor some value that doesn’t exist.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 15 (14 by maintainers)
Yes, I can send the PR for the doc. What do you think about