librosa: Default sample rate for librosa.core.load is not the native sample rate

Currently, the default sample rate is 22050 Hz when loading an audio file. In order to use the native sample rate, sr must be set to None. I often forget to set this to None when I want the native sample rate, and I have noticed many students do the same.

Have others experienced confusion over this? Should this change in a later release so that the native sample rate is the default?

About this issue

Original URL
State: closed
Created 7 years ago
Reactions: 2
Comments: 31 (25 by maintainers)

Most upvoted comments

Hi @NumesSanguis. Thank you for the feedback.

soundfile.read doesn’t accept a samplerate keyword for changing the sample rate at load time (and my Google-fu skills couldn’t find how to do it with soundfile at all)

To be fair, librosa does not really change the sample rate “at load time”. It loads the whole file as a numpy array at its native sample rate, and then calls librosa.resample to perform resampling of that numpy array. Calling librosa.load with sr=None simply skips the second step.

So, if librosa.load were to be deprecated, the recommended way to perform its default behavior would be:

import soundfile as sf
original_y, original_sr = sf.read("sample.wav")
y = librosa.resample(original_y, original_sr, 22050)

lostanlen on Feb 4, 2019

-1 to loadn

loadn strikes me as akin to idioms like kill -9, where you have to append something to the command so that it ends up doing what its name implies.

lostanlen on Feb 26, 2018

A more detailed discussion of this design choice can be found in our development blog: https://librosa.github.io/blog/2019/07/17/resample-on-load/#resample-on-load

Having thought a lot about this over the years, I still think that we gain more than we lose by forcing resample-on-load, and I’m not willing to change that. (Possibly abusing my executive power here, but 🤷‍♂️)

I may reconsider the choice of the default target sampling rate going forward, but that’s a separate question.

I don’t think there’s much point to keeping this discussion open, but I do thank you all for your input!

bmcfee on Aug 2, 2019

Related: we recently had some team members get bitten by forgetting to set sr when calling melspectrogram (which also has optional sr=22050).

It’s an easy mistake to make since (1) stft doesn’t take sr so it’s easy to forget to add it when switching from linear to mel and (2) it only affects the construction of the mel filterbank, but will still return an output with the expected dimensionality (mel bands and frames) so is easy to miss.

Instead of having a consistent expectation for the sampling rate, what if librosa always required sr as a positional argument in all functions where it is used? It would break everything in terms of backwards compatibility, but isn’t that what major versions are for? 😃 More importantly, it would prevent bugs resulting from expected sr values by ensuring the user is always explicit about the sr.

Just thought I’d throw this in here since it seemed related.

justinsalamon on Aug 28, 2018

Both unfortunate and convenient that librosa.load does these three things when there are top-level functions for the separate steps (librosa.to_mono and librosa.resample). Too late to deprecate both sr and mono args to load? It’s not 1.0 yet after all. 😉

y, sr = librosa.load('audio.ogg', sr=44100, to_mono=True)

vs.

y, sr = librosa.load('audio.ogg')
y = librosa.resample(y, sr, 44100)
y = librosa.to_mono(y)

I’d guess a lot of user code would break (in subtle and catastrophic ways) if y is suddenly 2D. 😭

carlthome on Mar 27, 2018

I should add that rload is not satisfying either, because, as of now, librosa.load does three things:

loading
converting to mono
resampling (maybe not in that order)

So writing rload would expose us to the inevitable issue: “default number of channels in librosa.rload is not the native number of channels”, leading to the suggestion of inventing rloadm for “rload mono”, and we’d roll downhill from there into matlabish babble (etfe anyone?)

Therefore, the only two viable ways forward are: (i) status quo, (ii) deprecating keywords sr and mono, and removing them in a future version.

The problem of this PR is that the principle of least surprise is IMHO equally balanced between options (i) and (ii).

lostanlen on Mar 27, 2018

I have the same problem. I agree with bmcfee that doing a deprecation move of librosa.load -> librosa.rload (resample-load), and providing a wrapper librosa.load(...) = librosa.rload(..., sr=None) seems to be a good idea, but we have to consider who is already using the default value without any problems. Then, I believe that, to meet the compatibility, it is best to keep the load method and create the method with native sample rate, where librosa.loadn = librosa.load (..., sr = None).

eduamf on Feb 25, 2018

Had a quick chat with @mcartwright and @lostanlen this morning. I think a reasonable compromise can be struck by doing a deprecation move of librosa.load -> librosa.rload (resample-load), and providing a wrapper librosa.load(...) = librosa.rload(..., sr=None).

This way, there’s a simple loader that acts as people might expect, and it’s a minimal change to recover the current / past behavior and things are more explicit.

I have two major concerns with this though:

It’ll break things down the road for people that don’t update their code to use the new loader,
I expect a lot of newbie errors resulting from naive use of load instead of rload, and expecting all the downstream analysis to work at whatever the native sampling rate happens to be.

Maybe it would be better to preserve the current load interface, and provide a second loadn that loads at the native sampling rate.

What do folks think?

bmcfee on May 16, 2017

But in which context does it need to be sample rate agnostic? Can’t we expect whoever using STFT/etc to know the sampling rate explicitly? For me it seems very natural that sr would be native when I don’t set anything while I wouldn’t expect any certain value would work well for STFT parameters. For example, does n_fft=2048 (the current default of STFT) with sr=22050 have some special meaning?

Site note: I think so. 90% seems a lot though.

keunwoochoi on May 15, 2017

I’ve had this feeling to, but there are further consequences: librosa uses a default window length for stft etc in samples, not seconds. This is fine if the sample rate is essentially always 22050, but is much less pleasant if you’re trying to be sample rate agnostic.

My preference would be for native sample rate by default, and window sizes etc defined in seconds. But that gets ugly because you want 2^R fft sizes, and it’s not trivial to enforce that transparently.

The 22kHz-by-default is quite a neat work around in practice.

DAn.

On Mon, May 15, 2017 at 12:02 Carl Thomé notifications@github.com wrote:

I too think it would be nice of sr=None was the default.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/librosa/librosa/issues/509#issuecomment-301521092, or mute the thread https://github.com/notifications/unsubscribe-auth/AAhs0fszzu_UUMRvbnud3nP0p2a8eb7Pks5r6HbxgaJpZM4L-Nzx .

dpwe on May 15, 2017

An alternate solution is to use presets to change the default parameters to load. I wouldn’t recommend this for teaching purposes, but it is sometimes useful when you have a fixed environment and don’t want to copy parameters everywhere.

bmcfee on Feb 14, 2017

I think this could be a good solution as I have had other issues with students when writing sound files (e.g. by default librosa uses the dtype of the array for the output, but a lot of audio players don’t support float32 encodings, etc.).

mcartwright on Feb 11, 2017