ffmpeg-normalize: Wrong volume after silence at start of track?

I’ve been using ffmpeg-normalize (EBU R128 method) to normalize the audio of gameplay recordings. Typically the recordings have a peak and LUFS significantly lower than the target volume, and I use ffmpeg-normalize to boost the volume. Sometimes there’s silence in the audio, like when the game is loading or paused.

When there are at least 2-3 seconds of silence at the beginning of the audio track, the result I get with ffmpeg-normalize has a lower-than-expected volume right after the silence, and then the volume gradually climbs toward the expected volume over a period of time.

Here’s an example. Waveform of original recording:

original

Zooming in on the original recording, to confirm that the volume is reasonably steady:

original_zoomed-in

Normalization result, using ffmpeg-normalize.exe original.aac -nt ebu -t -14 -c:a aac -o normalized.aac - it takes roughly 90 seconds to climb to the volume I’d expect from normalization:

normalized

If I trim most of the silence off the start, and then normalize, the volume seems to be fine throughout the track. Using ffmpeg -ss 11 -i original.aac -copyts trim_11.aac and ffmpeg-normalize.exe trim_11.aac -nt ebu -t -14 -c:a aac -o trim_11_normalized.aac:

trimmed_normalized

Windows 10, Python 3.8, ffmpeg 4.3.2. I’m happy to provide audio uploads, stats, more details/examples, etc. but I thought I’d check first - am I missing something obvious? Is this expected behavior, or am I missing a tuning parameter that would help?

About this issue

Original URL
State: open
Created 3 years ago
Comments: 22 (9 by maintainers)

Most upvoted comments

There is also an issue with timestamps rewriting, which could give issue online processing when gaps are present with timestamps and video too, causing lost of A/V sync.

There is not a lot of effort, it should be just matter of rewriting some chunks of code, currently looking how to do it best.

richardpl on Feb 22, 2022

I just can confirm that current loudnorm implementation is not correct at all, the scanner part is working well, but limiter/compressor/expander are buggy, and in worst cases can produce clipped output. This is because it does not take into account new peaks in attack & release stages of limiter.

richardpl on Feb 21, 2022

Interesting to see that it doesn’t need silence at the start to happen.

JoselleAstrid on Nov 19, 2021

No worries and of course thanks again for your work with this. I might conduct some experiments with dynaudnorm as well just out of curiosity.

Edit: Do not use dynaudnorm for music.

ndmgrphc on Nov 19, 2021

You’re right, this looks odd. Sorry there isn’t more that I can do …

slhck on Nov 19, 2021