pytorch-lightning: overfit_batches doesn't work
When I try to use overfit_batches: https://pytorch-lightning.readthedocs.io/en/latest/debugging.html#make-model-overfit-on-subset-of-data
trainer = Trainer(gpus=num_gpus, max_epochs=config.epochs, overfit_batches=0.01, logger=logger)
my code fails with:
trainer.fit(module)
File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in fit
self.single_gpu_train(model)
File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 176, in single_gpu_train
self.run_pretrain_routine(model)
File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1065, in run_pretrain_routine
self.reset_val_dataloader(ref_model)
File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 331, in reset_val_dataloader
self._reset_eval_dataloader(model, 'val')
File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 314, in _reset_eval_dataloader
f'you requested to check {limit_eval_batches} of the {mode} dataloader but'
pytorch_lightning.utilities.exceptions.MisconfigurationException: you requested to check 0.01 of the val dataloader but 0.01*0 = 0. Please increase the limit_val_batches. Try at least limit_val_batches=0.09090909090909091
P.S.: I also tried setting limit_val_batches=0.09090909090909091. Same error.
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Reactions: 1
- Comments: 15 (8 by maintainers)
I just looked at it. In summary:
OP had this problem:
MisconfigurationException: you requested to check 0.01 of the val dataloader but 0.01*0 = 0. Please increase the limit_val_batches. Try at least limit_val_batches=0.09090909090909091This message is correct, it is telling you that the percentage you have chosen corresponds to less than one batch. Solution: You need to increase the value. But what you probably want isoverfit_batches=1, and this works with exactly one batch without error.The example code by @willprice works on master #3501
The example by @itsikad shows an issue with DDP. As far as I can tell, this is the only remaining problem in this thread here. I can take a look