audio: Info length and rate returns different values for different backends
🐛 Bug
torchaudio.info returns the info objects directly from the respective backend. Due to same property naming, users might forget to check how the metadata is calculated. This results in metadata being reported differently depending on which backend is reported.
E.g. sox calculates the length summed across channels whereas soundfile does this per channel (correct)
I would propose to add wrapper for the info objects that - independent of the backend - the most important metadata (length and rate) is identical.
Currently, the sox backend reports a missleading length and the rate parameter is of type float instead of int.
To Reproduce
path = "any/wavfile.wav"
# soundfile
torchaudio.set_audio_backend("soundfile")
info = torchaudio.info(path)
print(si.length)
print(type(si.rate))
# sox
torchaudio.set_audio_backend("sox")
info = torchaudio.info(path)
print(si.length)
print(type(si.rate))
Expected behavior
soundfile reports the correct metadata, sox should be corrected so that:
# sox
torchaudio.set_audio_backend("sox")
info = torchaudio.info(path)
print(si.length // si.channels)
print(int(si.rate))
Environment
torchaudio==0.5.0 from pypi
About this issue
- Original URL
- State: closed
- Created 4 years ago
- Comments: 19 (10 by maintainers)
Regarding the first issue, this is probably a just a matter of setting the right vocabulary to make a formal distinction between frames and samples as it’s done in libsndfile. Over there:
and
which makes totally sense to me (also soundfile is the defacto standard when it comes to proper handling of audio I/O). However this would probably lead to too many changes here but it makes sense to put the definition that is used here (“we define
samplesare the number of frames in an audio signal per channel”).I agree this is probably the simplest solution
I started with a new test #639 that is expected to fail and can propose a fix for this as well ( in the same PR?)