ffmpeg-normalize: Wrong volume after silence at start of track?

I’ve been using ffmpeg-normalize (EBU R128 method) to normalize the audio of gameplay recordings. Typically the recordings have a peak and LUFS significantly lower than the target volume, and I use ffmpeg-normalize to boost the volume. Sometimes there’s silence in the audio, like when the game is loading or paused.

When there are at least 2-3 seconds of silence at the beginning of the audio track, the result I get with ffmpeg-normalize has a lower-than-expected volume right after the silence, and then the volume gradually climbs toward the expected volume over a period of time.

Here’s an example. Waveform of original recording:

original

Zooming in on the original recording, to confirm that the volume is reasonably steady:

original_zoomed-in

Normalization result, using ffmpeg-normalize.exe original.aac -nt ebu -t -14 -c:a aac -o normalized.aac - it takes roughly 90 seconds to climb to the volume I’d expect from normalization:

normalized

If I trim most of the silence off the start, and then normalize, the volume seems to be fine throughout the track. Using ffmpeg -ss 11 -i original.aac -copyts trim_11.aac and ffmpeg-normalize.exe trim_11.aac -nt ebu -t -14 -c:a aac -o trim_11_normalized.aac:

trimmed_normalized

Windows 10, Python 3.8, ffmpeg 4.3.2. I’m happy to provide audio uploads, stats, more details/examples, etc. but I thought I’d check first - am I missing something obvious? Is this expected behavior, or am I missing a tuning parameter that would help?

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Comments: 22 (9 by maintainers)

Most upvoted comments

There is also an issue with timestamps rewriting, which could give issue online processing when gaps are present with timestamps and video too, causing lost of A/V sync.

There is not a lot of effort, it should be just matter of rewriting some chunks of code, currently looking how to do it best.

I just can confirm that current loudnorm implementation is not correct at all, the scanner part is working well, but limiter/compressor/expander are buggy, and in worst cases can produce clipped output. This is because it does not take into account new peaks in attack & release stages of limiter.

Interesting to see that it doesn’t need silence at the start to happen.

No worries and of course thanks again for your work with this. I might conduct some experiments with dynaudnorm as well just out of curiosity.

Edit: Do not use dynaudnorm for music.

You’re right, this looks odd. Sorry there isn’t more that I can do …