whisperX: Diarization too slow

1 hour 30 minutes of audio were processing for over 1 hour in the diarization... stage. I’m using an RTX 3090.

I’m guessing --batch_size doesn’t affect pyannote. A setting for pyannote’s batch size would be very nice to have.

About this issue

Original URL
State: open
Created a year ago
Reactions: 1
Comments: 15 (5 by maintainers)

Most upvoted comments

I wrote that diarization takes 30sec, not the entire pipeline - before the change the diarization took almost 2 minutes. Your timing looks great, other than the transcribe step that is faster on my setup, but that’s probably due to the GPU you’re using.

davidas1 on Aug 3, 2023

Changing the pyannote pipeline is a bit more involved - I’m using an offline pipeline like described in https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb I had to patch whisperx a bit to allow working with a custom local pipeline. Using this method you can customize the pipeline by editing the config.yaml (change the “embedding” configuration to the desired model).

davidas1 on Aug 1, 2023

There is an issue regarding pyannote not using GPU, but it should not occur with whisperx. To read more on this, see pyannote/pyannote-audio#1354. It might have something to do with the device index though. Are both of your GPUs the same size? We’re currently not passing device_index to the diarization, so we will simply do to('cuda') on loading the diarization model. This might be a problem when multiple GPUs are available.

sorgfresser on Jun 2, 2023

@m-bain I’m also having extremely slow diarization. Using CLI.

Just now, to explore further, I also tried setting the --threads parameter to 50 to see if that would do something (I would prefer GPU!) and it is now making use of a variable number of threads, but well about four, which is what it had seemed to be limited to by default. There is still some GPU memory allocated even in the diarization stage, but not a ton. Very naive question–could things be slow because all of us have pyannote using CPU for some reason? Is there a way to specify that whisperx’s pyannote must use GPU?

For reference, in case it helps:

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
2
>>> torch.version.cuda
'11.7'

geoglrb on May 30, 2023

https://github.com/m-bain/whisperX/issues/159#issuecomment-1540035916

DigilConfianz on May 26, 2023