lhotse: load_state_dict doesn't catch 'StopIteration' exception
Hi everyone,
I am using icefall to train a RNNT model and after I start from a batch and load the sampler sate, the dynamicbucketsampler raise an StopIteration which resulted the program to abort. I think that “exception” should have been caught somewhere up the stack trace and processed.
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/home/azureuser/anaconda3/envs/icefall-torch1.10cu102/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 59, in _wrap
fn(i, *args)
File "/home/azureuser/codebase/icefall-birch/egs/gigaspeech/ASR/pruned_transducer_stateless2/train.py", line 869, in run
train_dl = gigaspeech.train_dataloaders(
File "/home/azureuser/codebase/icefall-birch/egs/gigaspeech/ASR/pruned_transducer_stateless2/asr_datamodule.py", line 320, in train_dataloaders
train_sampler.load_state_dict(sampler_state_dict)
File "/home/azureuser/codebase/lhotse-birch/lhotse/dataset/sampling/dynamic_bucketing.py", line 174, in load_state_dict
self._fast_forward()
File "/home/azureuser/codebase/lhotse-birch/lhotse/dataset/sampling/dynamic_bucketing.py", line 190, in _fast_forward
next(self)
File "/home/azureuser/codebase/lhotse-birch/lhotse/dataset/sampling/base.py", line 261, in __next__
batch = self._next_batch()
File "/home/azureuser/codebase/lhotse-birch/lhotse/dataset/sampling/dynamic_bucketing.py", line 233, in _next_batch
batch = next(self.cuts_iter)
StopIteration
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 21
Yeah, that was solved, I am using it for our training. And all epoch now contains the same correct number of batches! Thank you and I think we can close it now
On Sun, Jul 24, 2022 at 10:16 Piotr Żelasko @.***> wrote: