pytorch-lightning: AdvancedProfiler: ValueError: Attempting to stop recording an action (run_test_evaluation) which was never started.
🐛 Bug
If you have multiple Trainers and only one runs test, afterwards AdvancedProfiler crashes. This happens even if you free all Trainers except the one you want to run test on.
I have code that has multiple Trainers, doing a grid search. To each Trainer, I just added AdvancedProfiler.
I find the Trainer with the best_model_score and run test only on the best predictor. I then get this crash:
DATALOADER:0 TEST RESULTS
{'test_chroma_acc': 0.05000000074505806,
'test_loss': 1763.811279296875,
'test_pitch_acc': 0.0}
--------------------------------------------------------------------------------
Testing: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00, 22.07it/s]
0%| | 0/3 [00:24<?, ?it/s]
Traceback (most recent call last):
File "/Users/joseph/dev/neuralaudio/hear-eval-kit/heareval/predictions/runner.py", line 75, in <module>
runner()
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1062, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.9/site-packages/click/core.py", line 763, in invoke
return __callback(*args, **kwargs)
File "/Users/joseph/dev/neuralaudio/hear-eval-kit/heareval/predictions/runner.py", line 69, in runner
task_predictions(
File "/Users/joseph/dev/neuralaudio/hear-eval-kit/heareval/predictions/task_predictions.py", line 772, in task_predictions
test_scores = best_trainer.test(ckpt_path="best", test_dataloaders=test_dataloader)
File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 706, in test
results = self._run(model)
File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in _run
self._dispatch()
File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 982, in _dispatch
self.accelerator.start_evaluating(self)
File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/accelerators/accelerator.py", line 95, in start_evaluating
self.training_type_plugin.start_evaluating(trainer)
File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/plugins/training_type/training_type_plugin.py", line 165, in start_evaluating
self._results = trainer.run_stage()
File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 993, in run_stage
return self._run_evaluate()
File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/trainer/trainer.py", line 1079, in _run_evaluate
eval_loop_results = self._evaluation_loop.run()
File "/usr/local/Cellar/python@3.9/3.9.6/Frameworks/Python.framework/Versions/3.9/lib/python3.9/contextlib.py", line 124, in __exit__
next(self.gen)
File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/profiler/base.py", line 97, in profile
self.stop(action_name)
File "/usr/local/lib/python3.9/site-packages/pytorch_lightning/profiler/advanced.py", line 70, in stop
raise ValueError(f"Attempting to stop recording an action ({action_name}) which was never started.")
ValueError: Attempting to stop recording an action (run_test_evaluation) which was never started.
To Reproduce
https://colab.research.google.com/drive/1rnd1kmwea5BiAYwXq4WaAg5fa8T6GlUI?usp=sharing
I tried to reproduce but BoringModel is currently broken:
# some other options for random data
from pl_bolts.datasets import RandomDataset, DummyDataset, RandomDictDataset
/usr/local/lib/python3.7/dist-packages/pl_bolts/utils/warnings.py:32: UserWarning: You want to use `wandb` which is not installed yet, install it with `pip install wandb`.
f' install it with `pip install {pypi_name}`.' + extra_text
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-3-da4ba0fec73f> in <module>()
1 # some other options for random data
----> 2 from pl_bolts.datasets import RandomDataset, DummyDataset, RandomDictDataset
5 frames
/usr/local/lib/python3.7/dist-packages/pl_bolts/__init__.py in <module>()
17 _HTTPS_AWS_HUB = "https://pl-bolts-weights.s3.us-east-2.amazonaws.com"
18
---> 19 from pl_bolts import ( # noqa: E402
20 callbacks,
21 datamodules,
/usr/local/lib/python3.7/dist-packages/pl_bolts/datamodules/__init__.py in <module>()
3 from pl_bolts.datamodules.cifar10_datamodule import CIFAR10DataModule, TinyCIFAR10DataModule
4 from pl_bolts.datamodules.cityscapes_datamodule import CityscapesDataModule
----> 5 from pl_bolts.datamodules.experience_source import DiscountedExperienceSource, ExperienceSource, ExperienceSourceDataset
6 from pl_bolts.datamodules.fashion_mnist_datamodule import FashionMNISTDataModule
7 from pl_bolts.datamodules.imagenet_datamodule import ImagenetDataModule
/usr/local/lib/python3.7/dist-packages/pl_bolts/datamodules/experience_source.py in <module>()
22
23
---> 24 class ExperienceSourceDataset(IterableDataset):
25 """
26 Basic experience source dataset. Takes a generate_batch function that returns an iterator.
/usr/local/lib/python3.7/dist-packages/torch/utils/data/_typing.py in __new__(cls, name, bases, namespace, **kwargs)
271 for base in bases:
272 if isinstance(base, _DataPipeMeta):
--> 273 return super().__new__(cls, name, bases, namespace, **kwargs) # type: ignore[call-overload]
274
275 namespace.update({'type': _DEFAULT_TYPE,
/usr/lib/python3.7/abc.py in __new__(mcls, name, bases, namespace, **kwargs)
124 """
125 def __new__(mcls, name, bases, namespace, **kwargs):
--> 126 cls = super().__new__(mcls, name, bases, namespace, **kwargs)
127 _abc_init(cls)
128 return cls
/usr/local/lib/python3.7/dist-packages/torch/utils/data/_typing.py in _dp_init_subclass(sub_cls, *args, **kwargs)
369 return_hint.__origin__ == collections.abc.Iterator)):
370 raise TypeError("Expected 'Iterator' as the return annotation for `__iter__` of {}"
--> 371 ", but found {}".format(sub_cls.__name__, _type_repr(hints['return'])))
372 data_type = return_hint.__args__[0]
373 if not issubtype(data_type, sub_cls.type.param):
TypeError: Expected 'Iterator' as the return annotation for `__iter__` of ExperienceSourceDataset, but found typing.Iterable
Expected behavior
Profiling should continue as expected, and output for test only for the trainer which ran test.
Environment
- CUDA:
- GPU:
- available: False
- version: None
- Packages:
- numpy: 1.19.5
- pyTorch_debug: False
- pyTorch_version: 1.9.0
- pytorch-lightning: 1.4.1
- tqdm: 4.62.0
- System:
- OS: Darwin
- architecture:
- 64bit
- processor: i386
- python: 3.9.6
- version: Darwin Kernel Version 19.6.0: Tue Jun 22 19:49:55 PDT 2021; root:xnu-6153.141.35~1/RELEASE_X86_64
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 16 (10 by maintainers)
@tchaton I was experiencing this bug on Ubuntu (GCP GPU instances), not Colab, originally.
I just ran the colab python on OSX and got the error.
See my environment in the original comment
@tchaton I have updated my colab so that BoringModel replicates my bug.