ffmpeg-normalize: this doesn't seem to work on short files, about 3s or under?

If you want to report a bug, or have a specific question, please make sure to include this information:

  • Your operating system
  • Your Python version / distribution
  • Your ffmpeg version
  • The exact command you were trying to run
  • Any output you get when running the command with the --debug flag

About this issue

  • Original URL
  • State: closed
  • Created 6 years ago
  • Comments: 35 (15 by maintainers)

Commits related to this issue

Most upvoted comments

Thanks for sharing this. I have to admit, I’m not in favor of adding functionality to automatically pad and truncate the audio streams. That always bears potential for issues with audio-video sync. I’d rather just provide a warning when the audio stream is < 3 s and link to a FAQ entry.

That would make this otherwise wonderful tool useless for shorter clips, which as evidenced by the existence of this bug, people need. A use case for shorter audio clips is when normalized single spoken words when learning a language, as seen here. In this use case, audio normalization is important but the ability to sync is not important.

Therefore I suggest implementing the warning that audio may not sync properly after normalization, but enabling the pad-then-truncate to happen.

@dotancohen Thanks for your feedback. I’m not against such a feature per se, it’s just that it is a bit of additional work and may lead to files out of sync, so it needs to be well-tested. I’ll look into how to implement it, but I can’t give you an ETA on it, unfortunately.

I’ve known about this for a while but haven’t had time to fix it. I should really just fix the ffmpeg filter. Can you leave this open and assign to me?

OK, thanks for clarifying this. I’ll see if there’s a way to tune the parameters to make it work for small files. If not I’ll have to at least print a warning.

@5tan no issues with clipping if in your last line you change -acodec copy to -acodec %codec_name% After getting %codec_name% from FOR /F "tokens=*" %%C IN ('"ffprobe -i "%input_file%" -select_streams a:0 -show_entries stream=codec_name -hide_banner -v quiet -of csv=p=0"') DO ( SET codec_name=%%C)

This way the codec will be preserved instead of copied (the exact difference I was not able to understand so far 👍 )


Yes, (when using the approach with -acodec copy ) for the padding time (t) in terms of accuracy a size of 2048 samples does work for 16bit and any sample rate (Fs) using the formula t = LCM( Fs , size ) / Fs (not LCD!), but it did not work for me anymore once I dealt with 24bit files. And keep in mind that for example 44100Hz results in a pad time of 512 seconds…

Honestly, I did not fully understand what was going on, but I have a table if someone wants so experiment with it more 😄 I was not able to figure out the math for 24bit, all my values became ridicously high and did not even work in any way.

image

For anyone interested, the steps outlined in my issue:

ffmpeg -i input -af "adelay=10000|10000" enlarged

Pads the audio with ten seconds of silence at the beginning. Necessary because of this bug

ffmpeg -i enlarged -af loudnorm=I=-16:TP=-1.5:LRA=11:print_format=json -f null -

Gets loudness data from the file.

ffmpeg -i enlarged -af loudnorm=I=-16:TP=-1.5:LRA=11:measured_I=-XXX:measured_LRA=XXX:measured_TP=-XXX:measured_thresh=-XXX:offset=0.58:linear=true:print_format=summary -ar 48k paddedNormalized

Feeds the loudness data back into the normalization alg for better results

ffmpeg -i paddedNormalized -ss 00:00:10.000 -acodec copy normalized

Removes the 10 seconds of silence

Work just fine. This could be added to this library as a work-around for the upstream bug.

You can use the option to print the statistics and inspect the loudness before and after. But that’s not a proper solution either. I’ll see what I can do.

I’m running this command:

ffmpeg-normalize $i -c:a aac -nt ebu -t -5 -f -o processed_audio/$i.m4v

And I find that short files, less than 3 seconds or so, don’t get normalized. This may be an artefact of the algorithm needing more samples to work?

ProductName: Mac OS X ProductVersion: 10.13.6 BuildVersion: 17G65

Python 2.7.10 (default, Oct 6 2017, 22:29:07) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.31)] on darwin

ffmpeg version 4.0.2 Copyright © 2000-2018 the FFmpeg developers

DEBUG LOG:

Ross-MBP:audio_clips rossarnott$ ffmpeg-normalize W5S1-Rest-3-9-Introduction.m4v -c:a aac -nt ebu -t -5 -f --debug -o processed_audio/W5S1-Rest-3-9-Introduction2.m4v DEBUG: found executable in path: /usr/local/bin/ffmpeg DEBUG: found executable in path: /usr/local/bin/ffmpeg DEBUG: Running command: [‘/usr/local/bin/ffmpeg’, ‘-filters’] DEBUG: Parsing streams of W5S1-Rest-3-9-Introduction.m4v DEBUG: Running command: [‘/usr/local/bin/ffmpeg’, ‘-i’, ‘W5S1-Rest-3-9-Introduction.m4v’, ‘-c’, ‘copy’, ‘-t’, ‘0’, ‘-map’, ‘0’, ‘-f’, ‘null’, ‘/dev/null’] DEBUG: Stream parsing command output: DEBUG: ffmpeg version 4.0.2 Copyright © 2000-2018 the FFmpeg developers built with Apple LLVM version 10.0.0 (clang-1000.11.45.2) configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0.2 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma --enable-nonfree libavutil 56. 14.100 / 56. 14.100 libavcodec 58. 18.100 / 58. 18.100 libavformat 58. 12.100 / 58. 12.100 libavdevice 58. 3.100 / 58. 3.100 libavfilter 7. 16.100 / 7. 16.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 1.100 / 5. 1.100 libswresample 3. 1.100 / 3. 1.100 libpostproc 55. 1.100 / 55. 1.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from ‘W5S1-Rest-3-9-Introduction.m4v’: Metadata: major_brand : M4V minor_version : 1 compatible_brands: M4V M4A mp42isom creation_time : 2018-11-19T19:25:57.000000Z description : This video is about W5S1 3-9 Section 3 Rest Audio album_artist : Gabriel Kava keywords : Week 5,w5 s1 audio artist : Gabriel Kava title : W5S1 3-9 Section 3 Rest Audio Duration: 00:00:02.00, start: 0.000000, bitrate: 108 kb/s Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 99 kb/s (default) Metadata: creation_time : 2018-11-19T19:25:57.000000Z handler_name : Core Media Audio Output #0, null, to ‘/dev/null’: Metadata: major_brand : M4V minor_version : 1 compatible_brands: M4V M4A mp42isom title : W5S1 3-9 Section 3 Rest Audio description : This video is about W5S1 3-9 Section 3 Rest Audio album_artist : Gabriel Kava keywords : Week 5,w5 s1 audio artist : Gabriel Kava encoder : Lavf58.12.100 Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 99 kb/s (default) Metadata: creation_time : 2018-11-19T19:25:57.000000Z handler_name : Core Media Audio Stream mapping: Stream #0:0 -> #0:0 (copy) Press [q] to stop, [?] for help size=N/A time=00:00:00.00 bitrate=N/A speed= 0x
video:0kB audio:0kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown

DEBUG: Found audio stream at index 0 INFO: Normalizing file W5S1-Rest-3-9-Introduction.m4v (1 of 1) DEBUG: Running normalization for W5S1-Rest-3-9-Introduction.m4v DEBUG: Parsing normalization info for W5S1-Rest-3-9-Introduction.m4v INFO: Running first pass loudnorm filter for stream 0 DEBUG: Running ffmpeg command: [‘/usr/local/bin/ffmpeg’, ‘-nostdin’, ‘-y’, ‘-i’, ‘W5S1-Rest-3-9-Introduction.m4v’, ‘-filter_complex’, ‘[0:0]loudnorm=i=-5.0:lra=7.0:tp=-2.0:offset=0.0:print_format=json’, ‘-vn’, ‘-sn’, ‘-f’, ‘null’, ‘/dev/null’] DEBUG: Loudnorm first pass command output: DEBUG: ffmpeg version 4.0.2 Copyright © 2000-2018 the FFmpeg developers built with Apple LLVM version 10.0.0 (clang-1000.11.45.2) configuration: --prefix=/usr/local/Cellar/ffmpeg/4.0.2 --enable-shared --enable-pthreads --enable-version3 --enable-hardcoded-tables --enable-avresample --cc=clang --host-cflags= --host-ldflags= --enable-gpl --enable-libass --enable-libfdk-aac --enable-libfreetype --enable-libmp3lame --enable-libvpx --enable-libx264 --enable-libx265 --enable-libxvid --enable-opencl --enable-videotoolbox --disable-lzma --enable-nonfree libavutil 56. 14.100 / 56. 14.100 libavcodec 58. 18.100 / 58. 18.100 libavformat 58. 12.100 / 58. 12.100 libavdevice 58. 3.100 / 58. 3.100 libavfilter 7. 16.100 / 7. 16.100 libavresample 4. 0. 0 / 4. 0. 0 libswscale 5. 1.100 / 5. 1.100 libswresample 3. 1.100 / 3. 1.100 libpostproc 55. 1.100 / 55. 1.100 Input #0, mov,mp4,m4a,3gp,3g2,mj2, from ‘W5S1-Rest-3-9-Introduction.m4v’: Metadata: major_brand : M4V minor_version : 1 compatible_brands: M4V M4A mp42isom creation_time : 2018-11-19T19:25:57.000000Z description : This video is about W5S1 3-9 Section 3 Rest Audio album_artist : Gabriel Kava keywords : Week 5,w5 s1 audio artist : Gabriel Kava title : W5S1 3-9 Section 3 Rest Audio Duration: 00:00:02.00, start: 0.000000, bitrate: 108 kb/s Stream #0:0(eng): Audio: aac (LC) (mp4a / 0x6134706D), 48000 Hz, stereo, fltp, 99 kb/s (default) Metadata: creation_time : 2018-11-19T19:25:57.000000Z handler_name : Core Media Audio Stream mapping: Stream #0:0 (aac) -> loudnorm loudnorm -> Stream #0:0 (pcm_s16le) Output #0, null, to ‘/dev/null’: Metadata: major_brand : M4V minor_version : 1 compatible_brands: M4V M4A mp42isom title : W5S1 3-9 Section 3 Rest Audio description : This video is about W5S1 3-9 Section 3 Rest Audio album_artist : Gabriel Kava keywords : Week 5,w5 s1 audio artist : Gabriel Kava encoder : Lavf58.12.100 Stream #0:0: Audio: pcm_s16le, 192000 Hz, stereo, s16, 6144 kb/s (default) Metadata: encoder : Lavc58.18.100 pcm_s16le size=N/A time=00:00:02.00 bitrate=N/A speed=38.7x video:0kB audio:1504kB subtitle:0kB other streams:0kB global headers:0kB muxing overhead: unknown [Parsed_loudnorm_0 @ 0x7fab4fc095c0] { “input_i” : “-22.02”, “input_tp” : “-9.06”, “input_lra” : “0.00”, “input_thresh” : “-32.82”, “output_i” : “-21.55”, “output_tp” : “-8.62”, “output_lra” : “0.00”, “output_thresh” : “-32.36”, “normalization_type” : “linear”, “target_offset” : “16.55” } DEBUG: Loudnorm stats parsed: {“input_i”: “-22.02”, “input_tp”: “-9.06”, “input_lra”: “0.00”, “input_thresh”: “-32.82”, “output_i”: “-21.55”, “output_tp”: “-8.62”, “output_lra”: “0.00”, “output_thresh”: “-32.36”, “normalization_type”: “linear”, “target_offset”: “16.55”} INFO: Running second pass for W5S1-Rest-3-9-Introduction.m4v DEBUG: Running ffmpeg command: [‘/usr/local/bin/ffmpeg’, ‘-y’, ‘-nostdin’, ‘-i’, ‘W5S1-Rest-3-9-Introduction.m4v’, ‘-filter_complex’, ‘[0:0]loudnorm=i=-5.0:lra=7.0:tp=-2.0:offset=0.0:measured_i=-22.02:measured_lra=0.0:measured_tp=-9.06:measured_thresh=-32.82:linear=true:print_format=json[norm0]’, ‘-map_metadata’, ‘0’, ‘-map_chapters’, ‘0’, ‘-c:v’, ‘copy’, ‘-map’, ‘[norm0]’, ‘-c:a’, ‘aac’, ‘-c:s’, ‘copy’, ‘/var/folders/rp/0cqd1c012p7g9jf3mc7nbrvc0000gn/T/qcv46tfu.m4v’] DEBUG: Moving temporary file from /var/folders/rp/0cqd1c012p7g9jf3mc7nbrvc0000gn/T/qcv46tfu.m4v to processed_audio/W5S1-Rest-3-9-Introduction2.m4v DEBUG: Normalization finished INFO: Normalized file written to processed_audio/W5S1-Rest-3-9-Introduction2.m4v