ffmpeg-normalize: Wrong volume after silence at start of track?
I’ve been using ffmpeg-normalize (EBU R128 method) to normalize the audio of gameplay recordings. Typically the recordings have a peak and LUFS significantly lower than the target volume, and I use ffmpeg-normalize to boost the volume. Sometimes there’s silence in the audio, like when the game is loading or paused.
When there are at least 2-3 seconds of silence at the beginning of the audio track, the result I get with ffmpeg-normalize has a lower-than-expected volume right after the silence, and then the volume gradually climbs toward the expected volume over a period of time.
Here’s an example. Waveform of original recording:

Zooming in on the original recording, to confirm that the volume is reasonably steady:

Normalization result, using ffmpeg-normalize.exe original.aac -nt ebu -t -14 -c:a aac -o normalized.aac - it takes roughly 90 seconds to climb to the volume I’d expect from normalization:

If I trim most of the silence off the start, and then normalize, the volume seems to be fine throughout the track. Using ffmpeg -ss 11 -i original.aac -copyts trim_11.aac and ffmpeg-normalize.exe trim_11.aac -nt ebu -t -14 -c:a aac -o trim_11_normalized.aac:

Windows 10, Python 3.8, ffmpeg 4.3.2. I’m happy to provide audio uploads, stats, more details/examples, etc. but I thought I’d check first - am I missing something obvious? Is this expected behavior, or am I missing a tuning parameter that would help?
About this issue
- Original URL
- State: open
- Created 3 years ago
- Comments: 22 (9 by maintainers)
There is also an issue with timestamps rewriting, which could give issue online processing when gaps are present with timestamps and video too, causing lost of A/V sync.
There is not a lot of effort, it should be just matter of rewriting some chunks of code, currently looking how to do it best.
I just can confirm that current loudnorm implementation is not correct at all, the scanner part is working well, but limiter/compressor/expander are buggy, and in worst cases can produce clipped output. This is because it does not take into account new peaks in attack & release stages of limiter.
Interesting to see that it doesn’t need silence at the start to happen.
No worries and of course thanks again for your work with this. I might conduct some experiments with dynaudnorm as well just out of curiosity.
Edit: Do not use dynaudnorm for music.
You’re right, this looks odd. Sorry there isn’t more that I can do …