pytorch-lightning: overfit_batches doesn't work

When I try to use overfit_batches: https://pytorch-lightning.readthedocs.io/en/latest/debugging.html#make-model-overfit-on-subset-of-data

 trainer = Trainer(gpus=num_gpus, max_epochs=config.epochs, overfit_batches=0.01, logger=logger)

my code fails with:

   trainer.fit(module)
  File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 918, in fit
    self.single_gpu_train(model)
  File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/distrib_parts.py", line 176, in single_gpu_train
    self.run_pretrain_routine(model)
  File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/trainer.py", line 1065, in run_pretrain_routine
    self.reset_val_dataloader(ref_model)
  File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 331, in reset_val_dataloader
    self._reset_eval_dataloader(model, 'val')
  File "/home/andriy/miniconda3/envs/patchy_discs_model/lib/python3.7/site-packages/pytorch_lightning/trainer/data_loading.py", line 314, in _reset_eval_dataloader
    f'you requested to check {limit_eval_batches} of the {mode} dataloader but'
pytorch_lightning.utilities.exceptions.MisconfigurationException: you requested to check 0.01 of the val dataloader but 0.01*0 = 0. Please increase the limit_val_batches. Try at least limit_val_batches=0.09090909090909091

P.S.: I also tried setting limit_val_batches=0.09090909090909091. Same error.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Reactions: 1
  • Comments: 15 (8 by maintainers)

Most upvoted comments

I just looked at it. In summary:

  1. OP had this problem: MisconfigurationException: you requested to check 0.01 of the val dataloader but 0.01*0 = 0. Please increase the limit_val_batches. Try at least limit_val_batches=0.09090909090909091 This message is correct, it is telling you that the percentage you have chosen corresponds to less than one batch. Solution: You need to increase the value. But what you probably want is overfit_batches=1, and this works with exactly one batch without error.

  2. The example code by @willprice works on master #3501

  3. The example by @itsikad shows an issue with DDP. As far as I can tell, this is the only remaining problem in this thread here. I can take a look