librosa: Default sample rate for librosa.core.load is not the native sample rate
Currently, the default sample rate is 22050 Hz when loading an audio file. In order to use the native sample rate, sr must be set to None. I often forget to set this to None when I want the native sample rate, and I have noticed many students do the same.
Have others experienced confusion over this? Should this change in a later release so that the native sample rate is the default?
About this issue
- Original URL
- State: closed
- Created 7 years ago
- Reactions: 2
- Comments: 31 (25 by maintainers)
Hi @NumesSanguis. Thank you for the feedback.
To be fair, librosa does not really change the sample rate “at load time”. It loads the whole file as a numpy array at its native sample rate, and then calls
librosa.resampleto perform resampling of that numpy array. Callinglibrosa.loadwithsr=Nonesimply skips the second step.So, if
librosa.loadwere to be deprecated, the recommended way to perform its default behavior would be:-1 to
loadnloadnstrikes me as akin to idioms likekill -9, where you have to append something to the command so that it ends up doing what its name implies.A more detailed discussion of this design choice can be found in our development blog: https://librosa.github.io/blog/2019/07/17/resample-on-load/#resample-on-load
Having thought a lot about this over the years, I still think that we gain more than we lose by forcing resample-on-load, and I’m not willing to change that. (Possibly abusing my executive power here, but 🤷♂️)
I may reconsider the choice of the default target sampling rate going forward, but that’s a separate question.
I don’t think there’s much point to keeping this discussion open, but I do thank you all for your input!
Related: we recently had some team members get bitten by forgetting to set
srwhen callingmelspectrogram(which also has optionalsr=22050).It’s an easy mistake to make since (1) stft doesn’t take
srso it’s easy to forget to add it when switching from linear to mel and (2) it only affects the construction of the mel filterbank, but will still return an output with the expected dimensionality (mel bands and frames) so is easy to miss.Instead of having a consistent expectation for the sampling rate, what if librosa always required
sras a positional argument in all functions where it is used? It would break everything in terms of backwards compatibility, but isn’t that what major versions are for? 😃 More importantly, it would prevent bugs resulting from expectedsrvalues by ensuring the user is always explicit about thesr.Just thought I’d throw this in here since it seemed related.
Both unfortunate and convenient that
librosa.loaddoes these three things when there are top-level functions for the separate steps (librosa.to_monoandlibrosa.resample). Too late to deprecate bothsrandmonoargs toload? It’s not 1.0 yet after all. 😉vs.
I’d guess a lot of user code would break (in subtle and catastrophic ways) if
yis suddenly 2D. 😭I should add that
rloadis not satisfying either, because, as of now,librosa.loaddoes three things:So writing
rloadwould expose us to the inevitable issue: “default number of channels inlibrosa.rloadis not the native number of channels”, leading to the suggestion of inventingrloadmfor “rload mono”, and we’d roll downhill from there into matlabish babble (etfeanyone?)Therefore, the only two viable ways forward are: (i) status quo, (ii) deprecating keywords
srandmono, and removing them in a future version.The problem of this PR is that the principle of least surprise is IMHO equally balanced between options (i) and (ii).
I have the same problem. I agree with bmcfee that doing a deprecation move of
librosa.load -> librosa.rload(resample-load), and providing a wrapperlibrosa.load(...) = librosa.rload(..., sr=None)seems to be a good idea, but we have to consider who is already using the default value without any problems. Then, I believe that, to meet the compatibility, it is best to keep theloadmethod and create the method with native sample rate, wherelibrosa.loadn = librosa.load (..., sr = None).Had a quick chat with @mcartwright and @lostanlen this morning. I think a reasonable compromise can be struck by doing a deprecation move of
librosa.load -> librosa.rload(resample-load), and providing a wrapperlibrosa.load(...) = librosa.rload(..., sr=None).This way, there’s a simple loader that acts as people might expect, and it’s a minimal change to recover the current / past behavior and things are more explicit.
I have two major concerns with this though:
loadinstead ofrload, and expecting all the downstream analysis to work at whatever the native sampling rate happens to be.Maybe it would be better to preserve the current load interface, and provide a second
loadnthat loads at the native sampling rate.What do folks think?
But in which context does it need to be sample rate agnostic? Can’t we expect whoever using STFT/etc to know the sampling rate explicitly? For me it seems very natural that sr would be native when I don’t set anything while I wouldn’t expect any certain value would work well for STFT parameters. For example, does
n_fft=2048(the current default of STFT) withsr=22050have some special meaning?Site note: I think so. 90% seems a lot though.
I’ve had this feeling to, but there are further consequences: librosa uses a default window length for stft etc in samples, not seconds. This is fine if the sample rate is essentially always 22050, but is much less pleasant if you’re trying to be sample rate agnostic.
My preference would be for native sample rate by default, and window sizes etc defined in seconds. But that gets ugly because you want 2^R fft sizes, and it’s not trivial to enforce that transparently.
The 22kHz-by-default is quite a neat work around in practice.
DAn.
On Mon, May 15, 2017 at 12:02 Carl Thomé notifications@github.com wrote:
An alternate solution is to use presets to change the default parameters to load. I wouldn’t recommend this for teaching purposes, but it is sometimes useful when you have a fixed environment and don’t want to copy parameters everywhere.
I think this could be a good solution as I have had other issues with students when writing sound files (e.g. by default librosa uses the dtype of the array for the output, but a lot of audio players don’t support float32 encodings, etc.).