whisperX: Diarization too slow

1 hour 30 minutes of audio were processing for over 1 hour in the diarization... stage. I’m using an RTX 3090.

I’m guessing --batch_size doesn’t affect pyannote. A setting for pyannote’s batch size would be very nice to have.

About this issue

  • Original URL
  • State: open
  • Created a year ago
  • Reactions: 1
  • Comments: 15 (5 by maintainers)

Most upvoted comments

I wrote that diarization takes 30sec, not the entire pipeline - before the change the diarization took almost 2 minutes. Your timing looks great, other than the transcribe step that is faster on my setup, but that’s probably due to the GPU you’re using.

Changing the pyannote pipeline is a bit more involved - I’m using an offline pipeline like described in https://github.com/pyannote/pyannote-audio/blob/develop/tutorials/applying_a_pipeline.ipynb I had to patch whisperx a bit to allow working with a custom local pipeline. Using this method you can customize the pipeline by editing the config.yaml (change the “embedding” configuration to the desired model).

There is an issue regarding pyannote not using GPU, but it should not occur with whisperx. To read more on this, see pyannote/pyannote-audio#1354. It might have something to do with the device index though. Are both of your GPUs the same size? We’re currently not passing device_index to the diarization, so we will simply do to('cuda') on loading the diarization model. This might be a problem when multiple GPUs are available.

@m-bain I’m also having extremely slow diarization. Using CLI.

Just now, to explore further, I also tried setting the --threads parameter to 50 to see if that would do something (I would prefer GPU!) and it is now making use of a variable number of threads, but well about four, which is what it had seemed to be limited to by default. There is still some GPU memory allocated even in the diarization stage, but not a ton. Very naive question–could things be slow because all of us have pyannote using CPU for some reason? Is there a way to specify that whisperx’s pyannote must use GPU?

For reference, in case it helps:

>>> import torch
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count()
2
>>> torch.version.cuda
'11.7'