pytorch-lightning: Allow users to provide custom exception handling
๐ Feature
Allow users to provide custom exception handling via a new callback hook, similar to on_keyboard_interrupt.
Motivation
Users should be able to implement their own error handling if they want.
Pitch
Create a new callback hook here: https://github.com/PyTorchLightning/pytorch-lightning/blob/master/pytorch_lightning/trainer/trainer.py#L507-L515
Alternatives
Additional context
If you enjoy Lightning, check out our other projects! โก
-
Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
-
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning
-
Bolts: Pretrained SOTA Deep Learning models, callbacks and more for research and production with PyTorch Lightning and PyTorch
-
Lightning Transformers: Flexible interface for high performance research using SOTA Transformers leveraging Pytorch Lightning, Transformers, and Hydra.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Reactions: 3
- Comments: 17 (14 by maintainers)
I was in the middle of creating my own ticket for that when I saw yours, so I will add the complementary info I would have had added on that subject.
Proposed refactoring or deprecation
feat: add a callback hook for whenever a crash happens This could be implemented in the trainer like for the keyboard interrupt callback.
It does not seem that the teardown callback is called in the case of a crash, but only in the case of a completed training. It feels more appropriate to have a separate callback rather than forcing the use of teardown. It seems the teardown call does not seem to happen since the error is re-raised, thus halting the script before getting to teardown.
Motivation
The motivation behind this is to allow running code on failure of training. Use cases:
@aurelien-clu : @daniellepintz has dibs because she opened the feature request, next in line is
@yopknopixx but yโall can collaborate with discussion, testing and whoever makes the PR โค๏ธ thanks for your interest in this issue
Hi @yopknopixx I think this might be a good issue for you - https://github.com/PyTorchLightning/pytorch-lightning/issues/8313 LMK what you think! Feel free to start working on it even though it is assigned to me
Hi @yopknopixx Iโve already started so Iโd prefer to finish this one. But Iโm sure we can find you another issue to work on! @ananthsub do you happen to know of any good issues? Meanwhile I will look for one
Hey @daniellepintz,
Assigned this ticket to you and added to the current sprint.
Best, T.C
that sounds good to me!