whisperX: [v3] TypeError: cannot pickle 'generator' object

I am getting this error when trying v3 while doing this:

import whisperx

# transcribe with original whisper
model = whisperx.load_model("tiny", device, compute_type="float32")

audio = whisperx.load_audio("LRMonoPhase4.wav")
result = model.transcribe(audio, batch_size=8)

print(result["segments"]) # before alignment

saw in the docs of faster_whisper that they do this: ‘Warning: segments is a generator so the transcription only starts when you iterate over it. The transcription can be run to completion by gathering the segments in a list or a for loop:’

segments, _ = model.transcribe("audio.mp3")
segments = list(segments)  # The transcription will actually run here.

But this is not working for me either. Any ideas?

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Comments: 18 (13 by maintainers)

Most upvoted comments

created pull request for this, tested the model on Replicate, seems to work perfectly! Got 10x-15x faster inference time on a 30minute mp3

Just got the same error on Python 3.10, created a new 3.8 env with given install instructions and got the error again.

thanks, ermm I havent tested python3.10, but ill have a look tonight. If urgent I would try using python3.8 or 3.9 environment, hopefully that will work

Oh forgot to give the error sorry This is what i am getting: TypeError: cannot pickle 'generator' object

full error log is this:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /var/folders/9t/trtw4qwd7sxghhm3ql0y5zv00000gn/T/ipykernel_31336/2899375745.py:7 in <module>     │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/var/folders/9t/trtw4qwd7sxghhm3ql0y5zv00000gn/T/ipykernel_31336/2899375745.py'                 │
│                                                                                                  │
│ /opt/homebrew/lib/python3.10/site-packages/whisperx/asr.py:235 in transcribe                     │
│                                                                                                  │
│   232 │   │                                                                                      │
│   233 │   │   segments = []                                                                      │
│   234 │   │   batch_size = batch_size or self._batch_size                                        │
│ ❱ 235 │   │   for idx, out in enumerate(self.__call__(data(audio, vad_segments), batch_size=ba   │
│   236 │   │   │   text = out['text']                                                             │
│   237 │   │   │   if batch_size in [0, 1, None]:                                                 │
│   238 │   │   │   │   text = text[0]                                                             │
│                                                                                                  │
│ /opt/homebrew/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py:67 in __iter__     │
│                                                                                                  │
│    64 │   │   return len(self.loader)                                                            │
│    65 │                                                                                          │
│    66 │   def __iter__(self):                                                                    │
│ ❱  67 │   │   self.iterator = iter(self.loader)                                                  │
│    68 │   │   return self                                                                        │
│    69 │                                                                                          │
│    70 │   def loader_batch_item(self):                                                           │
│                                                                                                  │
│ /opt/homebrew/lib/python3.10/site-packages/transformers/pipelines/pt_utils.py:67 in __iter__     │
│                                                                                                  │
│    64 │   │   return len(self.loader)                                                            │
│    65 │                                                                                          │
│    66 │   def __iter__(self):                                                                    │
│ ❱  67 │   │   self.iterator = iter(self.loader)                                                  │
│    68 │   │   return self                                                                        │
│    69 │                                                                                          │
│    70 │   def loader_batch_item(self):                                                           │
│                                                                                                  │
│ /opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py:368 in __iter__        │
│                                                                                                  │
│    365 │   │   │   │   self._iterator._reset(self)                                               │
│    366 │   │   │   return self._iterator                                                         │
│    367 │   │   else:                                                                             │
│ ❱  368 │   │   │   return self._get_iterator()                                                   │
│    369 │                                                                                         │
│    370 │   @property                                                                             │
│    371 │   def _auto_collation(self):                                                            │
│                                                                                                  │
│ /opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py:314 in _get_iterator   │
│                                                                                                  │
│    311 │   │   │   return _SingleProcessDataLoaderIter(self)                                     │
│    312 │   │   else:                                                                             │
│    313 │   │   │   self.check_worker_number_rationality()                                        │
│ ❱  314 │   │   │   return _MultiProcessingDataLoaderIter(self)                                   │
│    315 │                                                                                         │
│    316 │   @property                                                                             │
│    317 │   def multiprocessing_context(self):                                                    │
│                                                                                                  │
│ /opt/homebrew/lib/python3.10/site-packages/torch/utils/data/dataloader.py:927 in __init__        │
│                                                                                                  │
│    924 │   │   │   #     it started, so that we do not call .join() if program dies              │
│    925 │   │   │   #     before it starts, and __del__ tries to join but will get:               │
│    926 │   │   │   #     AssertionError: can only join a started process.                        │
│ ❱  927 │   │   │   w.start()                                                                     │
│    928 │   │   │   self._index_queues.append(index_queue)                                        │
│    929 │   │   │   self._workers.append(w)                                                       │
│    930                                                                                           │
│                                                                                                  │
│ /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3 │
│ .10/multiprocessing/process.py:121 in start                                                      │
│                                                                                                  │
│   118 │   │   assert not _current_process._config.get('daemon'), \                               │
│   119 │   │   │      'daemonic processes are not allowed to have children'                       │
│   120 │   │   _cleanup()                                                                         │
│ ❱ 121 │   │   self._popen = self._Popen(self)                                                    │
│   122 │   │   self._sentinel = self._popen.sentinel                                              │
│   123 │   │   # Avoid a refcycle if the target function holds an indirect                        │
│   124 │   │   # reference to the process object (see bpo-30775)                                  │
│                                                                                                  │
│ /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3 │
│ .10/multiprocessing/context.py:224 in _Popen                                                     │
│                                                                                                  │
│   221 │   _start_method = None                                                                   │
│   222 │   @staticmethod                                                                          │
│   223 │   def _Popen(process_obj):                                                               │
│ ❱ 224 │   │   return _default_context.get_context().Process._Popen(process_obj)                  │
│   225 │                                                                                          │
│   226 │   @staticmethod                                                                          │
│   227 │   def _after_fork():                                                                     │
│                                                                                                  │
│ /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3 │
│ .10/multiprocessing/context.py:288 in _Popen                                                     │
│                                                                                                  │
│   285 │   │   @staticmethod                                                                      │
│   286 │   │   def _Popen(process_obj):                                                           │
│   287 │   │   │   from .popen_spawn_posix import Popen                                           │
│ ❱ 288 │   │   │   return Popen(process_obj)                                                      │
│   289 │   │                                                                                      │
│   290 │   │   @staticmethod                                                                      │
│   291 │   │   def _after_fork():                                                                 │
│                                                                                                  │
│ /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3 │
│ .10/multiprocessing/popen_spawn_posix.py:32 in __init__                                          │
│                                                                                                  │
│   29 │                                                                                           │
│   30 │   def __init__(self, process_obj):                                                        │
│   31 │   │   self._fds = []                                                                      │
│ ❱ 32 │   │   super().__init__(process_obj)                                                       │
│   33 │                                                                                           │
│   34 │   def duplicate_for_child(self, fd):                                                      │
│   35 │   │   self._fds.append(fd)                                                                │
│                                                                                                  │
│ /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3 │
│ .10/multiprocessing/popen_fork.py:19 in __init__                                                 │
│                                                                                                  │
│   16 │   │   util._flush_std_streams()                                                           │
│   17 │   │   self.returncode = None                                                              │
│   18 │   │   self.finalizer = None                                                               │
│ ❱ 19 │   │   self._launch(process_obj)                                                           │
│   20 │                                                                                           │
│   21 │   def duplicate_for_child(self, fd):                                                      │
│   22 │   │   return fd                                                                           │
│                                                                                                  │
│ /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3 │
│ .10/multiprocessing/popen_spawn_posix.py:47 in _launch                                           │
│                                                                                                  │
│   44 │   │   set_spawning_popen(self)                                                            │
│   45 │   │   try:                                                                                │
│   46 │   │   │   reduction.dump(prep_data, fp)                                                   │
│ ❱ 47 │   │   │   reduction.dump(process_obj, fp)                                                 │
│   48 │   │   finally:                                                                            │
│   49 │   │   │   set_spawning_popen(None)                                                        │
│   50                                                                                             │
│                                                                                                  │
│ /opt/homebrew/Cellar/python@3.10/3.10.10_1/Frameworks/Python.framework/Versions/3.10/lib/python3 │
│ .10/multiprocessing/reduction.py:60 in dump                                                      │
│                                                                                                  │
│    57                                                                                            │
│    58 def dump(obj, file, protocol=None):                                                        │
│    59 │   '''Replacement for pickle.dump() using ForkingPickler.'''                              │
│ ❱  60 │   ForkingPickler(file, protocol).dump(obj)                                               │
│    61                                                                                            │
│    62 #                                                                                          │
│    63 # Platform specific definitions                                                            │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
TypeError: cannot pickle 'generator' object