pydub: Error Manipulating Big (1Gb) MP3 File
I am trying to split a 12 hour mp3 audio file into 103 tracks. Basically, I am following the instructions to open, split and export a mp3 file, using something like:
pydub.AudioSegment.from_mp3(audio_file_name)[start:finish].export(new_track, format='mp3')
When I test with a small mp3 file it works just fine, but when I try to load the 1Gb file it crashes with the following error after about 30min running:
Traceback (most recent call last): File “c:\Users\20006083\Downloads\YoutubeMusic\split_audio.py”, line 61, in <module> song = AS.from_mp3(audio_file) File “C:\Users\40000438\AppData\Local\Continuum\Anaconda3\lib\site-packages\pydub\audio_segment.py”, line 707, in from_mp3 return cls.from_file(file, ‘mp3’, parameters) File “C:\Users\40000438\AppData\Local\Continuum\Anaconda3\lib\site-packages\pydub\audio_segment.py”, line 698, in from_file fix_wav_headers(p_out) File “C:\Users\40000438\AppData\Local\Continuum\Anaconda3\lib\site-packages\pydub\audio_segment.py”, line 141, in fix_wav_headers data[4:8] = struct.pack(‘<I’, len(data) - 8) struct.error: argument out of range
I was monitoring the memory usage while running my code, It got quite high, almost to the full 16Gb or RAM availiable in my computer. I am not sure if this is an issue with the library or it might be a crash related to memory overflow. Either way, I thought it would be nice to open this thread so the developers can verify this.
I am using python 3.6.1, the latest version of pydub (I installed today using the pip command) and also the latest version of ffmpeg (I got this file: ffmpeg-20180619-a990184-win64-shared). The audio file I am using is the audio from this youtube link:
https://www.youtube.com/watch?v=F__LuHDJko0&t=11595s
Any help solving this issue would be much appreciated.
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 16 (4 by maintainers)
Is it possible for pydub to support file streaming? Rather than loading a whole file into memory, just load it’s information and provide an iterator in case you need to process the audio bits.
With pull request #345 you could open little AudioSegments so you don’t have to load the entire audio file into memory
My workaround works - https://github.com/TeHikuMedia/wahi-korero/blob/master/wahi_korero/audiosegment.py - and we’re using it in “production.”
It’s more of a quick fix and I think would take a bit of effort to update pydub. Oh and it works blazingly fast even on an hour long recording! That’s for segmenting with webrtcvad.