faster-whisper: VAD is relatively slow

Hello guys,

I am using VAD of faster whisper using following commands. I found that on TedLium benchmark transcribing VAD takes 8% of time and 92% takes transcribing. I would prefer to decrease time of VAD so that it will not take more than 1%. Is it somehow possible to optimize VAD procedure in terms of real time?? Maybe it is possible to run VAD on several CPU’s? BTW, I see that VAD is running on CPU, is it possible to run it somehow on GPU?

# VAD
audio_buffer = decode_audio(audio_filename,
                                sampling_rate=whisper_sampling_rate)

# Get the speech chunks in the given audio buffer, and create a reduced audio buffer that contains only speech.    
speech_chunks = get_speech_timestamps(audio_buffer)
vad_audio_buffer = collect_chunks(audio_buffer, speech_chunks)

# Transribe the reduced audio buffer.
init_segments, _ = whisper_model.transcribe(vad_audio_buffer, language=language_code, beam_size=beam_size)

# Restore the true time-stamps for the segments.
segments = restore_speech_timestamps(init_segments, speech_chunks, whisper_sampling_rate)

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Comments: 24 (15 by maintainers)

Most upvoted comments

A couple of personal experience related comments here:

  • VAD model may be quite slow compared to ASR when processing relatively short audio files. Main reason of this is that its an RNN based model.
  • Last time I have tried it on GPU, there were no substantial speed-ups compared to CPU.
  • intra_op_num_threads effect on CPU inference is limited. I get slightly better runtime with 4 threads compared to 1 but > 4 is basically useless in my case/CPU. It’s not even 2x speed-up when you have 4 threads set.
  • Larger window_size_samples is the easiest way of improving the speed as it has less windows to process & forward-pass through the model.

Did tests on various samples to see “1536” effects on transcriptions. I see less fallbacks, much better timestamps in some cases, very positive effects on Demucs’ed files.

I made it default in r139.2.

u can make vad run on gpu

  1. install dependencies
pip uninstall onnxruntime
pip install onnxruntime-gpu
  1. edit code

in vad.py line 253-262 replace with

        opts = onnxruntime.SessionOptions()
        opts.log_severity_level = 4
        opts.graph_optimization_level = onnxruntime.GraphOptimizationLevel.ORT_ENABLE_BASIC
        # https://github.com/microsoft/onnxruntime/issues/11548#issuecomment-1158314424

        self.session = onnxruntime.InferenceSession(
            path,
            providers=["CUDAExecutionProvider"],
            sess_options=opts,
        )