speechbrain: interfaces.py separate_file() not working properly?
Specs: Windows Version 10.0.19042 Build 19042, Python 3.8.3
I’m following the Source Separation tutorial that can be accessed from this page.
Since the audio I have is already mixed, I tried to use model.separate_file(), based off the HuggingFace speechbrain/sepformer-wsj02mix code. There are two issues with that:
- CPU, RAM and disk usage on my PC shoot up into the high nineties, causing the computer to freeze.
- After the computer unfreezes (presumably, after the calculations are finished), an error is thrown:
Traceback (most recent call last):
...
est_sources = model.separate_file(path='data/test/audio/speech.wav')
File "C:\...\venv\lib\site-packages\speechbrain\pretrained\interfaces.py", line 710, in separate_file
est_sources = self.separate_batch(batch)
File "C:\...\venv\lib\site-packages\speechbrain\pretrained\interfaces.py", line 669, in separate_batch
est_mask = self.modules.masknet(mix_w)
File "C:\...\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\...\venv\lib\site-packages\speechbrain\lobes\models\dual_path.py", line 1124, in forward
x = self.dual_mdl[i](x)
File "C:\...\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\...\venv\lib\site-packages\speechbrain\lobes\models\dual_path.py", line 975, in forward
inter = self.inter_mdl(inter)
File "C:\...\venv\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\...\venv\lib\site-packages\speechbrain\lobes\models\dual_path.py", line 597, in forward
return self.mdl(x + pos_enc)[0]
RuntimeError: The size of tensor a (2830) must match the size of tensor b (2500) at non-singleton dimension 1
My guess is that the data that separate_file() passes to separate_batch() is incorrect:
source, fl = split_path(path)
path = fetch(fl, source=source, savedir=savedir)
batch, _ = torchaudio.load(path)
est_sources = self.separate_batch(batch)
Does resampling the data have something to do with it?
If, instead of using separate_file(), I write:
mix, fs = torchaudio.load('data/test/audio/speech.wav')
resampler = torchaudio.transforms.Resample(fs, 8000)
mix = resampler(mix)
est_sources = model.separate_batch(mix)
as suggested in the aforementioned tutorial, the computer doesn’t freeze and est_sources is a torch.Tensor with torch.Size([1, 471272, 2]), which looks like expected behavior to me.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 17
Thank you, Cem! I am in the process of this. Hope it works well. Happy early holidays!
Yes, I think you can process the long chunks and concatenate. In the transition you might have artifacts, but you can definitely try.
Best, Cem
Le ven. 15 déc. 2023, à 18 h 52, YC @.***> a écrit :
I actually don’t remember. I’ve seen this error multiple times, and usually it was one of those 3 things:
MODEL_SAMPLE_RATE = 8000)You can try if any of those solves the issue, and if it does please state it here ( don’t be like me XD ). Hope it helps!
@mravanelli @ycemsubakan maybe the ticket needs to be re-opened?
Yes, this is a good suggestion actually. We will add this to the code. Thank you @UrosOgrizovic !