faster-whisper: VAD is relatively slow
Hello guys,
I am using VAD of faster whisper using following commands. I found that on TedLium benchmark transcribing VAD takes 8% of time and 92% takes transcribing. I would prefer to decrease time of VAD so that it will not take more than 1%. Is it somehow possible to optimize VAD procedure in terms of real time?? Maybe it is possible to run VAD on several CPU’s? BTW, I see that VAD is running on CPU, is it possible to run it somehow on GPU?
# VAD
audio_buffer = decode_audio(audio_filename,
sampling_rate=whisper_sampling_rate)
# Get the speech chunks in the given audio buffer, and create a reduced audio buffer that contains only speech.
speech_chunks = get_speech_timestamps(audio_buffer)
vad_audio_buffer = collect_chunks(audio_buffer, speech_chunks)
# Transribe the reduced audio buffer.
init_segments, _ = whisper_model.transcribe(vad_audio_buffer, language=language_code, beam_size=beam_size)
# Restore the true time-stamps for the segments.
segments = restore_speech_timestamps(init_segments, speech_chunks, whisper_sampling_rate)
About this issue
- Original URL
- State: open
- Created a year ago
- Comments: 24 (15 by maintainers)
A couple of personal experience related comments here:
intra_op_num_threadseffect on CPU inference is limited. I get slightly better runtime with4threads compared to1but> 4is basically useless in my case/CPU. It’s not even 2x speed-up when you have4threads set.window_size_samplesis the easiest way of improving the speed as it has less windows to process & forward-pass through the model.Did tests on various samples to see “1536” effects on transcriptions. I see less fallbacks, much better timestamps in some cases, very positive effects on Demucs’ed files.
I made it default in r139.2.
u can make vad run on gpu
in
vad.pyline 253-262 replace with